Hi,

Has anyone tried IO from(to) ADLS account on Beam with Spark runner?
I was trying recently to do this but was unable to make it work.

Steps that I tried:

  1.  Took HDI + Spark 1.6 cluster with default storage as ADLS account.
  2.  Built Apache Beam on that. Built to include 
Beam-2790<https://issues.apache.org/jira/browse/BEAM-2790> fix which earlier I 
was facing for ADL as well.
  3.  Modified WordCount.java example to use HadoopFileSystemOptions
  4.  Since HDI + Spark cluster has ADLS as defaultFS, tried 2 things
     *   Just gave the input path and output path as adl://home/sample.txt and 
adl://home/output
     *   In addition to adl input and output path, also gave required HDFS 
configuration with adl required configs as well.

Both didn't worked btw.

  1.  Have checked ACL's and permissions. In fact similar job with same paths 
work on Spark directly.
  2.  Issues faced:
     *   For input, Beam is not able to find the path. Console log: Filepattern 
adl://home/sample.txt matched 0 files with total size 0
     *   Output path always gets converted to relative path, something like 
this: /home/user1/adl:/home/output/.tmp....





Debugging more into this but was checking if someone is already facing this and 
has some resolution.



Here is a sample code and command I used.



    HadoopFileSystemOptions options = 
PipelineOptionsFactory.fromArgs(args).as(HadoopFileSystemOptions.class);

    Pipeline p = Pipeline.create(options);

    p.apply( 
TextIO.read().from(options.getHdfsConfiguration().get(0).get("fs.defaultFS")))

     .apply(new CountWords())

     .apply(MapElements.via(new FormatAsTextFn()))

     .apply(TextIO.write().to("adl://home/output"));

    p.run().waitUntilFinish();





spark-submit --class org.apache.beam.examples.WordCount --master local 
beam-examples-java-2.3.0-SNAPSHOT.jar --runner=SparkRunner 
--hdfsConfiguration='[{\"fs.defaultFS\": \"hdfs://home/sample.txt\"]'





P.S: Created fat jar to use with spark just for testing. Is there any other 
correct way of running it with Spark runner?



-Milan.

Reply via email to