[
https://issues.apache.org/jira/browse/FLINK-32751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751105#comment-17751105
]
Matthias Pohl commented on FLINK-32751:
---------------------------------------
The actual test failure happened in
{{SortDistinctAggregateITCase#testMultiDistinctAggOnDifferentColumn}} which
derives the test from {{DistinctAggregateITCaseBase}}.
FYI: This can be determined by looking at the surefire reporting which prints
that {{HashDistinctAggregateITCase}} completed but
{{SortDistinctAggregateITCase}} didn't.
{code}
[...]
Aug 04 02:12:29 02:12:29.073 [INFO] Running
org.apache.flink.table.planner.runtime.batch.sql.agg.SortDistinctAggregateITCase
[...]
Aug 04 02:19:04 02:19:04.720 [INFO] Running
org.apache.flink.table.planner.runtime.batch.sql.agg.HashDistinctAggregateITCase
Aug 04 02:20:38 02:20:38.255 [INFO] Tests run: 23, Failures: 0, Errors: 0,
Skipped: 0, Time elapsed: 93.527 s - in
org.apache.flink.table.planner.runtime.batch.sql.agg.HashDistinctAggregateITCase
[...]
{code}
The issue we're seeing seems to be independent of the actual test, though.
The timeout happens when the {{CollectDynamicSink}} tries to request more data
through the Dispatcher which forwards the request. Unfortunately, we don't have
any logs from the Dispatcher side of that request. Therefore, we cannot
reliably say where the request halted.
[~Sergey Nuyanzin] had a point when pointing out that there are multiple other
past Jira issues (FLINK-20254, FLINK-22129, FLINK-22181, FLINK-22100) that had
a similar stacktrace. These Jiras were handled as duplicates of FLINK-21996
which was a bug in RPC layer with messages being swallowed. In the end, it's
strange that the request wasn't completed in some way due to the
{{MiniCluster}} having been shut down.
> DistinctAggregateITCaseBase.testMultiDistinctAggOnDifferentColumn got stuck
> on AZP
> ----------------------------------------------------------------------------------
>
> Key: FLINK-32751
> URL: https://issues.apache.org/jira/browse/FLINK-32751
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / API
> Affects Versions: 1.18.0
> Reporter: Sergey Nuyanzin
> Priority: Critical
> Labels: test-stability
>
> This build hangs
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51955&view=logs&j=ce3801ad-3bd5-5f06-d165-34d37e757d90&t=5e4d9387-1dcc-5885-a901-90469b7e6d2f&l=14399
> {noformat}
> Aug 04 03:03:47 "ForkJoinPool-1-worker-51" #28 daemon prio=5 os_prio=0
> cpu=49342.66ms elapsed=3079.49s tid=0x00007f67ccdd0000 nid=0x5234 waiting on
> condition [0x00007f6791a19000]
> Aug 04 03:03:47 java.lang.Thread.State: WAITING (parking)
> Aug 04 03:03:47 at
> jdk.internal.misc.Unsafe.park([email protected]/Native Method)
> Aug 04 03:03:47 - parking to wait for <0x00000000ad3b1fb8> (a
> java.util.concurrent.CompletableFuture$Signaller)
> Aug 04 03:03:47 at
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
> Aug 04 03:03:47 at
> java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1796)
> Aug 04 03:03:47 at
> java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3118)
> Aug 04 03:03:47 at
> java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1823)
> Aug 04 03:03:47 at
> java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:1998)
> Aug 04 03:03:47 at
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sendRequest(CollectResultFetcher.java:171)
> Aug 04 03:03:47 at
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:129)
> Aug 04 03:03:47 at
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:106)
> Aug 04 03:03:47 at
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80)
> Aug 04 03:03:47 at
> org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:222)
> Aug 04 03:03:47 at
> java.util.Iterator.forEachRemaining([email protected]/Iterator.java:132)
> Aug 04 03:03:47 at
> org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:122)
> Aug 04 03:03:47 at
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.executeQuery(BatchTestBase.scala:309)
> Aug 04 03:03:47 at
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.check(BatchTestBase.scala:145)
> Aug 04 03:03:47 at
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.checkResult(BatchTestBase.scala:109)
> Aug 04 03:03:47 at
> org.apache.flink.table.planner.runtime.batch.sql.agg.DistinctAggregateITCaseBase.testMultiDistinctAggOnDifferentColumn(DistinctAggregateITCaseBase.scala:97)
> ~~
> {noformat}
> it is very likely that it is an old issue
> the similar case was mentioned for 1.11.0 and closed because of lack of
> occurrences
> FLINK-16923
> and another similar one FLINK-22100 which was marked as a duplicate of
> FLINK-21996
--
This message was sent by Atlassian Jira
(v8.20.10#820010)