[
https://issues.apache.org/jira/browse/BEAM-11275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282444#comment-17282444
]
Evgeniy edited comment on BEAM-11275 at 2/10/21, 12:43 PM:
-----------------------------------------------------------
I have some issue with Beam and Flink. I use `FlinkRunner`
Pipeline options:
{code:java}
--runner=FlinkRunner,
--flink_master=*.*.*.*:8081
--environment_type=EXTERNAL
--environment_config=*.*.*.*:50000
--flink_version=1.10
{code}
Flink taskexecutor log:
{code:java}
Java.io.FileNotFoundException:
/tmp/beam-temp3g1emmsd/artifactsbv63p2is/a9d47b4ae5a1f4a0b5fddd51d2cd152583637e1353a615d5457aa19494051826/1-ref_Environment_default_e-tfx_ephemeral-0.27.0.tar.gz
(No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:127)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:83)
at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:256)
at
org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:124)
at
org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:99)
at
org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc$MethodHandlers.invoke(ArtifactRetrievalServiceGrpc.java:327)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:817)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Beam worker pool log:
{code:java}
2021/02/10 12:28:10 Failed to retrieve staged files: failed to retrieve
/tmp/staged in 3 attempts: failed to retrieve chunk for
/tmp/staged/tfx_ephemeral-0.27.0.tar.gz
{code}
was (Author: ferryvg):
I have some issue with Beam and Flink. I use `FlinkRunner`
Pipeline options:
{code:java}
--runner=FlinkRunner,
--flink_master=...:8081
--environment_type=EXTERNAL
--environment_config=...:50000
--flink_version=1.10
{code}
Flink taskexecutor log:
{code:java}
Java.io.FileNotFoundException:
/tmp/beam-temp3g1emmsd/artifactsbv63p2is/a9d47b4ae5a1f4a0b5fddd51d2cd152583637e1353a615d5457aa19494051826/1-ref_Environment_default_e-tfx_ephemeral-0.27.0.tar.gz
(No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:127)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:83)
at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:256)
at
org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:124)
at
org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:99)
at
org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc$MethodHandlers.invoke(ArtifactRetrievalServiceGrpc.java:327)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:817)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Beam worker pool log:
{code:java}
2021/02/10 12:28:10 Failed to retrieve staged files: failed to retrieve
/tmp/staged in 3 attempts: failed to retrieve chunk for
/tmp/staged/tfx_ephemeral-0.27.0.tar.gz
{code}
> Support GCS files for extra_requirements argument in Python Beam portable
> runners
> ---------------------------------------------------------------------------------
>
> Key: BEAM-11275
> URL: https://issues.apache.org/jira/browse/BEAM-11275
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Gerard Casas Saez
> Assignee: Calvin Leung
> Priority: P2
>
> Currently Portable runners only support locally available files for adding
> dependencies on remote workers. This can be seen in
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/stager.py#L429
> as it uses shutil.copyfile when it detects file is remote and it is not http.
> An easy extension would be to extend _is_remote_path in Stager to detect if
> the path matches any filesystem and if it does the avoid downloading and let
> it be copied afterwards.
> Acceptance criteria:
> - `extra_package` can be a GCS path instead of requiring it to be local only.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)