[ 
https://issues.apache.org/jira/browse/FLINK-19982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237818#comment-17237818
 ] 

Caizhi Weng edited comment on FLINK-19982 at 11/24/20, 3:22 AM:
----------------------------------------------------------------

[~rmetzger] [~jark] I currently have no idea why this is happening. The warning 
are not related because they also appear in other tests, and collect iterator 
is designed to be tolerable for such exception. But as such failure only 
happens once I'm suspecting that this is a very rare unstable situation.

However I would like to change the debug log in line 388 of 
{{CollectSinkFunction.java}} to an info log for future investigation. If 
similar exceptions happen again please inform me.


was (Author: tsreaper):
[~rmetzger] [~jark] I currently have no idea why this is happening. The warning 
are not related because they also appear in other tests, and collect iterator 
is designed to be exception tolerable. But as such failure only happens once 
I'm suspecting that this is a very rare unstable situation.

However I would like to change the debug log in line 388 of 
{{CollectSinkFunction.java}} to an info log for future investigation. If 
similar exceptions happen again please inform me.

> AggregateReduceGroupingITCase.testSingleAggOnTable_SortAgg fails with 
> "RuntimeException: Job restarted"
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19982
>                 URL: https://issues.apache.org/jira/browse/FLINK-19982
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.12.0
>            Reporter: Robert Metzger
>            Assignee: Caizhi Weng
>            Priority: Major
>              Labels: pull-request-available, test-stability
>             Fix For: 1.12.0
>
>
> https://dev.azure.com/georgeryan1322/Flink/_build/results?buildId=336&view=logs&j=a1590513-d0ea-59c3-3c7b-aad756c48f25&t=5129dea2-618b-5c74-1b8f-9ec63a37a8a6
> {code}
> [ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 59.688 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase
> [ERROR] 
> testSingleAggOnTable_SortAgg(org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase)
>   Time elapsed: 2.789 s  <<< ERROR!
> java.lang.RuntimeException: Job restarted
>       at 
> org.apache.flink.streaming.api.operators.collect.UncheckpointedCollectResultBuffer.sinkRestarted(UncheckpointedCollectResultBuffer.java:41)
>       at 
> org.apache.flink.streaming.api.operators.collect.AbstractCollectResultBuffer.dealWithResponse(AbstractCollectResultBuffer.java:87)
>       at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:127)
>       at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
>       at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
>       at 
> org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
>       at 
> org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
>       at java.util.Iterator.forEachRemaining(Iterator.java:115)
>       at 
> org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:114)
>       at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.executeQuery(BatchTestBase.scala:298)
>       at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.check(BatchTestBase.scala:138)
>       at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.checkResult(BatchTestBase.scala:104)
>       at 
> org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase.testSingleAggOnTable(AggregateReduceGroupingITCase.scala:153)
>       at 
> org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase.testSingleAggOnTable_SortAgg(AggregateReduceGroupingITCase.scala:122)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {code}
> In the logs, I find occurrences of this:
> {code}
> 16:37:49,262 [                main] WARN  
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher [] - An 
> exception occurs when fetching query results
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.dispatcher.UnavailableDispatcherOperationException: 
> Unable to get JobMasterGateway for initializing job. The requested operation 
> is not available while the JobManager is initializing.
>       at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
> ~[?:1.8.0_242]
>       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) 
> ~[?:1.8.0_242]
>       at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sendRequest(CollectResultFetcher.java:163)
>  ~[flink-streaming-java_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:134)
>  [flink-streaming-java_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:103)
>  [flink-streaming-java_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:77)
>  [flink-streaming-java_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.table.planner.sinks.SelectTableSinkBase$RowIteratorWrapper.hasNext(SelectTableSinkBase.java:115)
>  [classes/:?]
>       at 
> org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
>  [flink-table-api-java-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at java.util.Iterator.forEachRemaining(Iterator.java:115) [?:1.8.0_242]
>       at 
> org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:114) 
> [flink-core-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.executeQuery(BatchTestBase.scala:298)
>  [test-classes/:?]
>       at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.check(BatchTestBase.scala:138)
>  [test-classes/:?]
>       at 
> org.apache.flink.table.planner.runtime.utils.BatchTestBase.checkResult(BatchTestBase.scala:104)
>  [test-classes/:?]
>       at 
> org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase.testSingleAggOnTable(AggregateReduceGroupingITCase.scala:153)
>  [test-classes/:?]
>       at 
> org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase.testSingleAggOnTable_SortAgg(AggregateReduceGroupingITCase.scala:122)
>  [test-classes/:?]
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_242]
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_242]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_242]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_242]
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  [junit-4.12.jar:4.12]
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.12.jar:4.12]
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  [junit-4.12.jar:4.12]
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  [junit-4.12.jar:4.12]
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> [junit-4.12.jar:4.12]
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> [junit-4.12.jar:4.12]
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) 
> [junit-4.12.jar:4.12]
>       at org.junit.rules.RunRules.evaluate(RunRules.java:20) 
> [junit-4.12.jar:4.12]
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 
> [junit-4.12.jar:4.12]
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  [junit-4.12.jar:4.12]
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  [junit-4.12.jar:4.12]
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 
> [junit-4.12.jar:4.12]
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 
> [junit-4.12.jar:4.12]
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 
> [junit-4.12.jar:4.12]
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 
> [junit-4.12.jar:4.12]
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 
> [junit-4.12.jar:4.12]
>       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) 
> [junit-4.12.jar:4.12]
>       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) 
> [junit-4.12.jar:4.12]
>       at org.junit.rules.RunRules.evaluate(RunRules.java:20) 
> [junit-4.12.jar:4.12]
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:363) 
> [junit-4.12.jar:4.12]
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  [surefire-junit4-2.22.1.jar:2.22.1]
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  [surefire-junit4-2.22.1.jar:2.22.1]
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>  [surefire-junit4-2.22.1.jar:2.22.1]
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>  [surefire-junit4-2.22.1.jar:2.22.1]
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>  [surefire-booter-2.22.1.jar:2.22.1]
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>  [surefire-booter-2.22.1.jar:2.22.1]
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) 
> [surefire-booter-2.22.1.jar:2.22.1]
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> [surefire-booter-2.22.1.jar:2.22.1]
> Caused by: 
> org.apache.flink.runtime.dispatcher.UnavailableDispatcherOperationException: 
> Unable to get JobMasterGateway for initializing job. The requested operation 
> is not available while the JobManager is initializing.
>       at 
> org.apache.flink.runtime.dispatcher.Dispatcher.getJobMasterGateway(Dispatcher.java:793)
>  ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.runtime.dispatcher.Dispatcher.performOperationOnJobMasterGateway(Dispatcher.java:800)
>  ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.runtime.dispatcher.Dispatcher.deliverCoordinationRequestToCoordinator(Dispatcher.java:631)
>  ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source) ~[?:?]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_242]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_242]
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:286)
>  ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:201)
>  ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>  ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> ~[scala-library-2.11.12.jar:?]
>       at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> ~[scala-library-2.11.12.jar:?]
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> ~[scala-library-2.11.12.jar:?]
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> ~[scala-library-2.11.12.jar:?]
>       at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
> ~[akka-actor_2.11-2.5.21.jar:2.5.21]
>       at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>  ~[akka-actor_2.11-2.5.21.jar:2.5.21]
> {code}
> These are just WARNings, not sure if it is related or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to