[
https://issues.apache.org/jira/browse/BEAM-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314256#comment-17314256
]
Ismaël Mejía commented on BEAM-12088:
-------------------------------------
[~ibzib] I opened BEAM-12091 as a follow up for your suggestions on the PR. I
will mark this as resolved and continue in the other one.
> Make file staging uniform among Spark Runners
> ---------------------------------------------
>
> Key: BEAM-12088
> URL: https://issues.apache.org/jira/browse/BEAM-12088
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Ismaël Mejía
> Assignee: Ismaël Mejía
> Priority: P2
> Fix For: 2.30.0
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Both the Spark Classic and Portable runners share the file staging logic, but
> the Structured Streaming runner is using a different logic even if the
> process should in principle be the same. This manifests on issues when trying
> to deploy pipelines via Hadoop YARN with exceptions related to file Staging
> of non existent files.
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
> /tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
> (No such file or directory)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:120)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.packageDirectoriesToStage(PipelineResources.java:94)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.lambda$prepareFilesForStaging$1(PipelineResources.java:85)
> at
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.prepareFilesForStaging(PipelineResources.java:88)
> at
> org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(PipelineTranslator.java:61)
> at
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.translatePipeline(SparkStructuredStreamingRunner.java:199)
> at
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:153)
> at
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:70)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:308)
> at
> com.talend.tpcds.beam.Query3ViaBeamSdkGenericRecord.main(Query3ViaBeamSdkGenericRecord.java:240)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
> Caused by: java.io.FileNotFoundException:
> /tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
> (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:101)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:118)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)