[
https://issues.apache.org/jira/browse/BEAM-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438324#comment-17438324
]
Luke Cwik commented on BEAM-13164:
----------------------------------
So it seems as though there is a general race condition being observed where we
are passing the `InboundObserver` to the `outboundObserverFactory`
[here|https://github.com/apache/beam/blob/bee60cdc29d995170d4461692e391683ca7dcafd/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer2.java#L68].
This allows for incoming calls to start before the
`BeamFnDataGrpcMultiplexer2` is fully initialized meaning that the
`erroredInstructionIds` is accessed before it is initialized.
This seems like a general problem with how we connect the construct the
outbound observer that only happens on the SDK harness since there is no need
to connect the inbound and outbound observers this way on the
ServerCallStreamObservers.
> beam_PostCommit_Java_PVR_Spark_Batch timing out
> -----------------------------------------------
>
> Key: BEAM-13164
> URL: https://issues.apache.org/jira/browse/BEAM-13164
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Andrew Pilloud
> Assignee: Luke Cwik
> Priority: P1
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Looks like this went from being a flake to a hard failure:
> https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/
> https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/5009/
> 18:41:18 Build timed out (after 100 minutes). Marking the build as aborted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)