[
https://issues.apache.org/jira/browse/BEAM-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismaël Mejía updated BEAM-12088:
--------------------------------
Description:
Both the Spark Classic and Portable runners share the file staging logic, but
the Structured Streaming runner is using a different logic even if the process
should in principle be the same. This manifests on issues when trying to deploy
pipelines via Hadoop YARN with exceptions related to file Staging of non
existent files.
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
/tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
(No such file or directory)
at
org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:120)
at
org.apache.beam.runners.core.construction.resources.PipelineResources.packageDirectoriesToStage(PipelineResources.java:94)
at
org.apache.beam.runners.core.construction.resources.PipelineResources.lambda$prepareFilesForStaging$1(PipelineResources.java:85)
at
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at
org.apache.beam.runners.core.construction.resources.PipelineResources.prepareFilesForStaging(PipelineResources.java:88)
at
org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(PipelineTranslator.java:61)
at
org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.translatePipeline(SparkStructuredStreamingRunner.java:199)
at
org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:153)
at
org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:70)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:308)
at
com.talend.tpcds.beam.Query3ViaBeamSdkGenericRecord.main(Query3ViaBeamSdkGenericRecord.java:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
Caused by: java.io.FileNotFoundException:
/tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
(No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:101)
at
org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:118)
was:Both the Spark Classic and Portable runners share the file staging logic,
but the Structured Streaming runner is using a different logic even if the
process should in principle be the same. This manifests on issues when trying
to deploy pipelines via Hadoop YARN with exceptions related to file Staging.
> Make file staging uniform among Spark Runners
> ---------------------------------------------
>
> Key: BEAM-12088
> URL: https://issues.apache.org/jira/browse/BEAM-12088
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Ismaël Mejía
> Assignee: Ismaël Mejía
> Priority: P2
> Fix For: 2.30.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Both the Spark Classic and Portable runners share the file staging logic, but
> the Structured Streaming runner is using a different logic even if the
> process should in principle be the same. This manifests on issues when trying
> to deploy pipelines via Hadoop YARN with exceptions related to file Staging
> of non existent files.
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
> /tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
> (No such file or directory)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:120)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.packageDirectoriesToStage(PipelineResources.java:94)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.lambda$prepareFilesForStaging$1(PipelineResources.java:85)
> at
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.prepareFilesForStaging(PipelineResources.java:88)
> at
> org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(PipelineTranslator.java:61)
> at
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.translatePipeline(SparkStructuredStreamingRunner.java:199)
> at
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:153)
> at
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:70)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:308)
> at
> com.talend.tpcds.beam.Query3ViaBeamSdkGenericRecord.main(Query3ViaBeamSdkGenericRecord.java:240)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
> Caused by: java.io.FileNotFoundException:
> /tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
> (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:101)
> at
> org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:118)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)