[
https://issues.apache.org/jira/browse/BEAM-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438233#comment-17438233
]
Luke Cwik commented on BEAM-13164:
----------------------------------
Looks like the SDK harness is trying to connect to Spark for an unknown
endpoint. This implies that the server is unaware of the endpoint that it told
the SDK harness to connect on.
{noformat}
21/11/03 11:11:10 INFO
org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: 1 Beam Fn
Logging clients still connected during shutdown.
21/11/03 11:11:10 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer:
Hanged up for unknown endpoint.
21/11/03 11:11:10 ERROR org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer2:
Failed to handle for url: "InProcessServer_328"
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException: CANCELLED:
Multiplexer hanging up
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status.asRuntimeException(Status.java:535)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Shutting SDK harness down.
21/11/03 11:11:21 WARN org.apache.spark.executor.Executor: Issue communicating
with driver in heartbeater
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87)
at
org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:78)
at
org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:589)
at
org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1000)
at
org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:212)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:296)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:524)
at
org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:116)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
21/11/03 11:11:21 ERROR org.apache.spark.rpc.netty.Inbox: Ignoring error
java.lang.NullPointerException
at
org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:524)
at
org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:116)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/11/03 11:11:26 INFO org.apache.beam.runners.spark.SparkPipelineRunner:
Running job
combinetest0windowingtests0testslidingwindowscombine-lcwik-1103181102-2f9939bb_0c5eafd2-ea83-49df-8b03-6d7b8bbb4328
on Spark master local[4]
21/11/03 11:11:26 WARN
org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions:
Either coder LengthPrefixCoder(ByteArrayCoder) or
IntervalWindow$IntervalWindowCoder is not consistent with equals. That might
cause issues on some runners.
21/11/03 11:11:26 WARN
org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions:
Either coder LengthPrefixCoder(ByteArrayCoder) or GlobalWindow$Coder is not
consistent with equals. That might cause issues on some runners.
21/11/03 11:11:26 INFO org.apache.beam.runners.spark.SparkPipelineRunner: Job
combinetest0windowingtests0testslidingwindowscombine-lcwik-1103181102-2f9939bb_0c5eafd2-ea83-49df-8b03-6d7b8bbb4328:
Pipeline translated successfully. Computing outputs
21/11/03 11:11:27 INFO org.apache.beam.fn.harness.FnHarness: Fn Harness started
21/11/03 11:11:27 INFO
org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Beam Fn Logging
client connected.
21/11/03 11:11:27 INFO org.apache.beam.fn.harness.FnHarness: Entering
instruction processing loop
21/11/03 11:11:27 INFO
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService: Beam
Fn Control client connected with id 56-1
21/11/03 11:11:27 INFO
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService:
getProcessBundleDescriptor request with id 56-2
21/11/03 11:11:27 INFO
org.apache.beam.runners.fnexecution.data.GrpcDataService: Beam Fn Data client
connected.
21/11/03 11:11:27 ERROR org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer:
Failed to handle for unknown endpoint
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException: CANCELLED:
client cancelled
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status.asRuntimeException(Status.java:526)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:284)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closedInternal(ServerCallImpl.java:353)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:341)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:844)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/11/03 11:11:27 ERROR org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer2:
Failed to handle for url: "InProcessServer_334"
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.StatusRuntimeException: CANCELLED:
Failed to read message.
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.Status.asRuntimeException(Status.java:535)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer2$InboundObserver.forwardToConsumerForInstructionId(BeamFnDataGrpcMultiplexer2.java:213)
at
org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer2$InboundObserver.onNext(BeamFnDataGrpcMultiplexer2.java:184)
at
org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer2$InboundObserver.onNext(BeamFnDataGrpcMultiplexer2.java:157)
at
org.apache.beam.sdk.fn.stream.ForwardingClientResponseObserver.onNext(ForwardingClientResponseObserver.java:49)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:465)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:652)
at
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:637)
... 5 more
{noformat}
> beam_PostCommit_Java_PVR_Spark_Batch timing out
> -----------------------------------------------
>
> Key: BEAM-13164
> URL: https://issues.apache.org/jira/browse/BEAM-13164
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Andrew Pilloud
> Assignee: Luke Cwik
> Priority: P1
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Looks like this went from being a flake to a hard failure:
> https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/
> https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/5009/
> 18:41:18 Build timed out (after 100 minutes). Marking the build as aborted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)