[
https://issues.apache.org/jira/browse/BEAM-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041248#comment-17041248
]
Kyle Weaver commented on BEAM-9225:
-----------------------------------
The Flink uber jar job server has two streams, one for state and one for
messages. They share history [1] and don't yield a message if it is a duplicate
according to that history [2]. So if the message stream reads the terminal
state first, the state history will never yield it, and the job will never
register as complete.
I avoided this in the Spark implementation by deferring de-duplication to
later, so all messages are yielded regardless of whether they are duplicates or
not.
[1]
https://github.com/apache/beam/blob/57ce5b966cfc4f549082a47f50c29d5f9caa2909/sdks/python/apache_beam/runners/portability/abstract_job_service.py#L252
[2]
https://github.com/apache/beam/blob/57ce5b966cfc4f549082a47f50c29d5f9caa2909/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L205
> Flink uber jar job server hangs
> -------------------------------
>
> Key: BEAM-9225
> URL: https://issues.apache.org/jira/browse/BEAM-9225
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Kyle Weaver
> Assignee: Kyle Weaver
> Priority: Major
> Labels: portability-flink
> Fix For: 2.20.0
>
>
> This was observed on Kubernetes. I suspect this behavior might also be the
> reason beam_PostCommit_PortableJar_Flink is timing out.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)