[
https://issues.apache.org/jira/browse/BEAM-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kyle Weaver updated BEAM-8312:
------------------------------
Description:
Currently, Flink job jars re-stage all artifacts at runtime (on the JobManager)
by using the usual BeamFileSystemArtifactRetrievalService [1]. However, since
the manifest and all the artifacts live on the classpath of the jar, and
everything from the classpath is copied to the Flink workers anyway [2], it
should not be necessary to do additional artifact staging. We could replace
BeamFileSystemArtifactRetrievalService in this case with a simple
ArtifactRetrievalService that just pulls the artifacts from the classpath.
[1]
[https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java]
[2]
[https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L93]
was:
Currently, Flink job jars stage all artifacts by using the usual
BeamFileSystemArtifactRetrievalService [1]. However, since the manifest and all
the artifacts live on the classpath of the jar, and everything from the
classpath is copied to the Flink workers anyway, it should not be necessary to
do additional artifact staging. We could replace
BeamFileSystemArtifactRetrievalService in this case with a simple
ArtifactRetrievalService that just pulls the artifacts from the classpath.
[1]
[https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java]
> Flink portable pipeline jars do not need to stage artifacts remotely
> --------------------------------------------------------------------
>
> Key: BEAM-8312
> URL: https://issues.apache.org/jira/browse/BEAM-8312
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Kyle Weaver
> Assignee: Kyle Weaver
> Priority: Major
> Labels: portability-flink
>
> Currently, Flink job jars re-stage all artifacts at runtime (on the
> JobManager) by using the usual BeamFileSystemArtifactRetrievalService [1].
> However, since the manifest and all the artifacts live on the classpath of
> the jar, and everything from the classpath is copied to the Flink workers
> anyway [2], it should not be necessary to do additional artifact staging. We
> could replace BeamFileSystemArtifactRetrievalService in this case with a
> simple ArtifactRetrievalService that just pulls the artifacts from the
> classpath.
>
> [1]
> [https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java]
> [2]
> [https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L93]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)