[
https://issues.apache.org/jira/browse/FLINK-19069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190129#comment-17190129
]
Till Rohrmann commented on FLINK-19069:
---------------------------------------
I think we should make sure that the user code is run in a non-blocking
fashion. This means that we don't let the user/implementor of some user
interface decide. What we have to make sure is that the call
{{FinalizeOnMaster.finalizeGlobal}} is executed outside of the main thread at
least.
In the {{ExecutionGraph}} we need to handle the concurrent results properly
which also means to handle concurrent {{JobStatus}} changes of the
{{ExecutionGraph}}. Also, one needs to think about what happens if the job gets
cancelled concurrently. Assuming that the {{finalizeOnMaster}} calls belong to
the lifetime of the {{ExecutionGraph}}, one would have to wait for these calls
to finish before we can move the {{ExecutionGraph}} into a terminal state.
> finalizeOnMaster takes too much time and client timeouts
> --------------------------------------------------------
>
> Key: FLINK-19069
> URL: https://issues.apache.org/jira/browse/FLINK-19069
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.9.0, 1.10.0, 1.11.0, 1.12.0
> Reporter: Jiayi Liao
> Priority: Critical
> Fix For: 1.12.0, 1.11.2, 1.10.3
>
>
> Currently we execute {{finalizeOnMaster}} in JM's main thread, which may
> stuck the JM for a very long time and client timeouts eventually.
> For example, we'd like to write data to HDFS and commit files on JM, which
> takes more than ten minutes to commit tens of thousands files.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)