[
https://issues.apache.org/jira/browse/FLINK-31104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695585#comment-17695585
]
Weijie Guo edited comment on FLINK-31104 at 3/2/23 9:33 AM:
------------------------------------------------------------
Hi all,
To be honest, the root cause of this ticket is difficult to investigate because
there is no enough context information provided here like thread dump or heap
dump. But I have reproduced the problem of TPC-DS timeout on our internal
cluster. I guess that is caused by the same reason as this ticket.
After some investigations, I found that we do have a bug in
{{{}LocalBufferPool{}}}, but this should be due to FLINK-26762, which has been
merged into the master since 1.16.
To fix this bug, I created FLINK-31293. At the same time, I found that in the
batch scenario, we do not need the overdraft buffer actually. I will disable it
in FLINK-31288, which should make our TPC-DS test stable.
Since the problem was not introduced in 1.17 and can be temporarily fixed
through FLINK-31288, I suggest lowering the priority to unblock 1.17 release
process. [~renqs] [~mapohl] WDYT?
was (Author: weijie guo):
Hi all,
To be honest, the root cause of this ticket is difficult to investigate because
there is no enough context information provided here like thread dump or heap
dump. But I have reproduced the problem of TPC-DS timeout on our internal
cluster. I guess that is caused by the same reason as this ticket.
After some investigations, I found that we do have a bug in
{{{}LocalBufferPool{}}}, but this should be due to FLINK-26762, which has been
merged into the master since 1.16.
To fix this bug, I created FLINK-31293. At the same time, I found that in the
batch scenario, we do not need the overdraft buffer actually. I will disable it
in FLINK-31288, which should make our TPC-DS stable.
Since the problem was not introduced in 1.17 and can be temporarily fixed
through FLINK-31288, I suggest lowering the priority to unblock 1.17 release
process. [~renqs] [~mapohl] WDYT?
> TPC-DS test timed out in query 36
> ---------------------------------
>
> Key: FLINK-31104
> URL: https://issues.apache.org/jira/browse/FLINK-31104
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime, Tests
> Affects Versions: 1.17.0
> Reporter: Matthias Pohl
> Assignee: Weijie Guo
> Priority: Blocker
> Labels: test-stability
>
> There has a timeout happened in
> [apache-flink:flink-end-to-end-tests/flink-tpcds-test/tpcds-tool/query/query36.sql|https://github.com/apache/flink/blob/20c983c26262057c4d59bd591aed89969a8ff525/flink-end-to-end-tests/flink-tpcds-test/tpcds-tool/query/query36.sql]
> of the TPC-DS test suite:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46202&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1&l=880
> {code}
> [...]
> Feb 16 04:58:23 [INFO]Run TPC-DS query 36 ...
> Feb 16 04:58:23 Job has been submitted with JobID
> 4d0c1e6cbde9f0b6ae8b9f9afd159c06
> {code}
> Unfortunately, no further logs are provided.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)