[jira] [Updated] (BEAM-12088) Make file staging uniform among Spark Runners

Jira Fri, 02 Apr 2021 16:50:04 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ismaël Mejía updated BEAM-12088:
--------------------------------
    Description: 
Both the Spark Classic and Portable runners share the file staging logic, but 
the Structured Streaming runner is using a different logic even if the process 
should in principle be the same. This manifests on issues when trying to deploy 
pipelines via Hadoop YARN with exceptions related to file Staging of non 
existent files.

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
/tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
 (No such file or directory)
        at 
org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:120)
        at 
org.apache.beam.runners.core.construction.resources.PipelineResources.packageDirectoriesToStage(PipelineResources.java:94)
        at 
org.apache.beam.runners.core.construction.resources.PipelineResources.lambda$prepareFilesForStaging$1(PipelineResources.java:85)
        at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
        at 
org.apache.beam.runners.core.construction.resources.PipelineResources.prepareFilesForStaging(PipelineResources.java:88)
        at 
org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(PipelineTranslator.java:61)
        at 
org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.translatePipeline(SparkStructuredStreamingRunner.java:199)
        at 
org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:153)
        at 
org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:70)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:308)
        at 
com.talend.tpcds.beam.Query3ViaBeamSdkGenericRecord.main(Query3ViaBeamSdkGenericRecord.java:240)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
Caused by: java.io.FileNotFoundException: 
/tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
 (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:101)
        at 
org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:118)


 

  was:Both the Spark Classic and Portable runners share the file staging logic, 
but the Structured Streaming runner is using a different logic even if the 
process should in principle be the same. This manifests on issues when trying 
to deploy pipelines via Hadoop YARN with exceptions related to file Staging.


> Make file staging uniform among Spark Runners
> ---------------------------------------------
>
>                 Key: BEAM-12088
>                 URL: https://issues.apache.org/jira/browse/BEAM-12088
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Ismaël Mejía
>            Assignee: Ismaël Mejía
>            Priority: P2
>             Fix For: 2.30.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Both the Spark Classic and Portable runners share the file staging logic, but 
> the Structured Streaming runner is using a different logic even if the 
> process should in principle be the same. This manifests on issues when trying 
> to deploy pipelines via Hadoop YARN with exceptions related to file Staging 
> of non existent files.
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
>  (No such file or directory)
>         at 
> org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:120)
>         at 
> org.apache.beam.runners.core.construction.resources.PipelineResources.packageDirectoriesToStage(PipelineResources.java:94)
>         at 
> org.apache.beam.runners.core.construction.resources.PipelineResources.lambda$prepareFilesForStaging$1(PipelineResources.java:85)
>         at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>         at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>         at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
>         at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>         at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>         at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>         at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>         at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
>         at 
> org.apache.beam.runners.core.construction.resources.PipelineResources.prepareFilesForStaging(PipelineResources.java:88)
>         at 
> org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(PipelineTranslator.java:61)
>         at 
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.translatePipeline(SparkStructuredStreamingRunner.java:199)
>         at 
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:153)
>         at 
> org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner.run(SparkStructuredStreamingRunner.java:70)
>         at org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)
>         at org.apache.beam.sdk.Pipeline.run(Pipeline.java:308)
>         at 
> com.talend.tpcds.beam.Query3ViaBeamSdkGenericRecord.main(Query3ViaBeamSdkGenericRecord.java:240)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
> Caused by: java.io.FileNotFoundException: 
> /tmp/beamTempLocation/86b9809c0719d64ac39af2beb29c73ff71cb8264add4f1bc01f27fefddf6f5c1.jar
>  (No such file or directory)
>         at java.io.FileOutputStream.open0(Native Method)
>         at java.io.FileOutputStream.open(FileOutputStream.java:270)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:101)
>         at 
> org.apache.beam.runners.core.construction.resources.PipelineResources.zipDirectory(PipelineResources.java:118)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-12088) Make file staging uniform among Spark Runners

Reply via email to