[ 
https://issues.apache.org/jira/browse/BEAM-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704904#comment-16704904
 ] 

Maximilian Michels commented on BEAM-6111:
------------------------------------------

This is caused by executing the state completion request in 
{{GrpcStateService}} asynchronously with the default threadpool. I think this 
wasn't intended because {{whenCompleteAsync}} was used instead of 
{{whenComplete}}. The default thread pool in the test is Flink's thread pool 
which doesn't take care to log exceptions.

https://github.com/apache/beam/blob/c526f6bf62a1d63c7181eb7252c134e42d5c8677/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java#L82

With the current implementation the StateRequest will always be a completed 
CompleteableFuture anyways, so there is no need to schedule asynchronously.

The following exception was thrown here: 
https://github.com/apache/beam/blob/c526f6bf62a1d63c7181eb7252c134e42d5c8677/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java#L85

{noformat}
java.lang.IllegalStateException: sendHeaders has already been called
    at 
org.apache.beam.vendor.grpc.v1_13_1.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
    at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerCallImpl.sendHeaders(ServerCallImpl.java:88)
    at 
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:338)
    at 
org.apache.beam.runners.fnexecution.state.GrpcStateService$Inbound.lambda$onNext$0(GrpcStateService.java:142)
    at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at 
java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 
{noformat}

> org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey 
> 40/50 runs failed
> ------------------------------------------------------------------------------------------
>
>                 Key: BEAM-6111
>                 URL: https://issues.apache.org/jira/browse/BEAM-6111
>             Project: Beam
>          Issue Type: Test
>          Components: runner-flink
>            Reporter: Alex Amato
>            Assignee: Maximilian Michels
>            Priority: Major
>              Labels: flake
>
> I ran this test 50 on master and it failed about 40/50 runs. I have also seen 
> this failing in jenkins presubmit.
> build scan links from java presubmit
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/2654/
> build scan links when I ran locally:
> [https://scans.gradle.com/s/hzu3bojfdbd5s]
> [https://scans.gradle.com/s/txhyj6acebkko]
> [https://scans.gradle.com/s/nhju5jthjbdds]
> org.junit.runners.model.TestTimedOutException: test timed out after 60000 
> milliseconds
> at java.lang.ClassLoader.findBootstrapClass(Native Method)
> at java.lang.ClassLoader.findBootstrapClassOrNull(ClassLoader.java:1015)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:413)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:339)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:13)
> at org.junit.Assert.assertThat(Assert.java:956)
> at org.junit.Assert.assertThat(Assert.java:923)
> at 
> org.apache.beam.runners.flink.PortableTimersExecutionTest.testTimerExecution(PortableTimersExecutionTest.java:191)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to