[
https://issues.apache.org/jira/browse/FLINK-31104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694431#comment-17694431
]
dalongliu commented on FLINK-31104:
-----------------------------------
I also occur the query timeout when running tpcds in the yarn cluster, the
thread stack as follows:
{code:java}
"HashJoin[3174] [Source: store_sales[3210], Source:
household_demographics[3185], Source: store_sales[3144]] -> Calc[3175] ->
HashAggregate[3176] -> Calc[3177] (940/1500)#0" Id=7959 WAITING on
java.util.concurrent.CompletableFuture$Signaller@4a270956
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CompletableFuture$Signaller@4a270956
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:384)
at
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:350)
at
org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.requestNetworkBuffers(SortMergeResultPartition.java:339)
at
org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.createNewDataBuffer(SortMergeResultPartition.java:307)
at
org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.getBroadcastDataBuffer(SortMergeResultPartition.java:302)
at
org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.emit(SortMergeResultPartition.java:256)
at
org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.broadcast(SortMergeResultPartition.java:248)
at
org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.broadcastRecord(SortMergeResultPartition.java:223)
at
org.apache.flink.runtime.io.network.api.writer.BroadcastRecordWriter.broadcastEmit(BroadcastRecordWriter.java:48)
at
org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:121)
at
org.apache.flink.streaming.api.operators.CountingOutput.emitWatermark(CountingOutput.java:43)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:604)
at
org.apache.flink.table.runtime.operators.TableStreamOperator.processWatermark(TableStreamOperator.java:57)
at
org.apache.flink.streaming.runtime.tasks.ChainingOutput.emitWatermark(ChainingOutput.java:107)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:604)
at
org.apache.flink.table.runtime.operators.TableStreamOperator.processWatermark(TableStreamOperator.java:57)
at
org.apache.flink.streaming.runtime.tasks.ChainingOutput.emitWatermark(ChainingOutput.java:107)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:604)
at
org.apache.flink.table.runtime.operators.TableStreamOperator.processWatermark(TableStreamOperator.java:57)
at
org.apache.flink.streaming.runtime.tasks.ChainingOutput.emitWatermark(ChainingOutput.java:107)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:604)
at
org.apache.flink.table.runtime.operators.TableStreamOperator.processWatermark(TableStreamOperator.java:57)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:609)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark2(AbstractStreamOperator.java:618)
at
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitWatermark(StreamTwoInputProcessorFactory.java:268)
at
org.apache.flink.streaming.runtime.watermarkstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:199)
at
org.apache.flink.streaming.runtime.watermarkstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:114)
at
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:148)
at
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at
org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:547)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$653/803691740.runDefaultAction(Unknown
Source)
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:834)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:783)
at
org.apache.flink.runtime.taskmanager.Task$$Lambda$1273/1500345048.run(Unknown
Source)
at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.lang.Thread.run(Thread.java:750)"System Time Trigger for
HashJoin[3174] [Source: store_sales[3210], Source:
household_demographics[3185], Source: store_sales[3144]] -> Calc[3175] ->
HashAggregate[3176] -> Calc[3177] (707/1500)#0" Id=7955 WAITING on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5358647
at sun.misc.Unsafe.park(Native Method)
- waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5358647
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750) {code}
> TPC-DS test timed out in query 36
> ---------------------------------
>
> Key: FLINK-31104
> URL: https://issues.apache.org/jira/browse/FLINK-31104
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime, Tests
> Affects Versions: 1.17.0
> Reporter: Matthias Pohl
> Priority: Blocker
> Labels: test-stability
>
> There has a timeout happened in
> [apache-flink:flink-end-to-end-tests/flink-tpcds-test/tpcds-tool/query/query36.sql|https://github.com/apache/flink/blob/20c983c26262057c4d59bd591aed89969a8ff525/flink-end-to-end-tests/flink-tpcds-test/tpcds-tool/query/query36.sql]
> of the TPC-DS test suite:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46202&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1&l=880
> {code}
> [...]
> Feb 16 04:58:23 [INFO]Run TPC-DS query 36 ...
> Feb 16 04:58:23 Job has been submitted with JobID
> 4d0c1e6cbde9f0b6ae8b9f9afd159c06
> {code}
> Unfortunately, no further logs are provided.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)