[
https://issues.apache.org/jira/browse/BEAM-10955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370940#comment-17370940
]
Valentyn Tymofieiev commented on BEAM-10955:
--------------------------------------------
I thought perhaps we could bisect to the breaking change, so I tried
reproducing this via:
{noformat}
./gradlew runners:flink:1.11:test --tests
"org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestoreLegacy"
--info
{noformat}
with this patch to re-run the test every time:
{noformat}
index 9c8ee14c34..b3e517b6aa 100644
--- a/runners/flink/flink_runner.gradle
+++ b/runners/flink/flink_runner.gradle
@@ -118,6 +118,7 @@ test {
}
mustRunAfter(":runners:flink:${version}:test")
}
+ outputs.upToDateWhen { false }
}
{noformat}
Not sure what's happening. Perhaps side effects from other tests or flink
processes OOMing in some cases, for example when Jenkins happens to run some
other test suite in parallel, and there is not enough memory.
Given only 'Legacy' tests seem to be failing temporarily disabling the flaky
test SGTM.
> Flink Java Runner test flake: Could not find Flink job
> -------------------------------------------------------
>
> Key: BEAM-10955
> URL: https://issues.apache.org/jira/browse/BEAM-10955
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Valentyn Tymofieiev
> Priority: P1
> Labels: currently-failing, flake
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Opening to track how frequent this is. Observed on
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestoreLegacy
> inĀ https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/13716
> {noformat}
> java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find
> Flink job (6eddfd5820521f0898920a538c4a82dd)
> Stacktrace
> java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find
> Flink job (6eddfd5820521f0898920a538c4a82dd)
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> at
> org.apache.beam.runners.flink.FlinkSavepointTest.takeSavepointAndCancelJob(FlinkSavepointTest.java:253)
> at
> org.apache.beam.runners.flink.FlinkSavepointTest.runSavepointAndRestore(FlinkSavepointTest.java:172)
> at
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestoreLegacy(FlinkSavepointTest.java:144)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could
> not find Flink job (6eddfd5820521f0898920a538c4a82dd)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.getJobMasterGatewayFuture(Dispatcher.java:803)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.triggerSavepoint(Dispatcher.java:582)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:274)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:189)
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
> at
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Standard Output
> Shutting SDK harness down.
> Shutting SDK harness down.
> Standard Error
> [Test worker] INFO org.apache.beam.runners.flink.FlinkSavepointTest -
> Savepoints will be written to file:///tmp/junit5733722483577894112
> [Test worker] INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The
> configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback
> keys: []) required for local execution is not set, setting it to its default
> value 1.7976931348623157E308
> [Test worker] INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The con
> ...[truncated 527576 chars]...
> flink-metrics-2] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService -
> Stopped Akka RPC service.
> [flink-metrics-2] INFO org.apache.flink.runtime.blob.PermanentBlobCache -
> Shutting down BLOB cache
> [flink-metrics-2] INFO org.apache.flink.runtime.blob.TransientBlobCache -
> Shutting down BLOB cache
> [flink-metrics-2] INFO org.apache.flink.runtime.blob.BlobServer - Stopped
> BLOB server at 0.0.0.0:46101
> [flink-metrics-2] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService -
> Stopped Akka RPC service.
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)