Hi, FYI, I'm in touch with Microsoft Azure team about that.
We are testing the ADLS support via HDFS. I keep you posted. Regards JB On 11/22/2017 09:12 AM, Milan Chandna wrote:
Hi, Has anyone tried IO from(to) ADLS account on Beam with Spark runner? I was trying recently to do this but was unable to make it work. Steps that I tried: 1. Took HDI + Spark 1.6 cluster with default storage as ADLS account. 2. Built Apache Beam on that. Built to include Beam-2790<https://issues.apache.org/jira/browse/BEAM-2790> fix which earlier I was facing for ADL as well. 3. Modified WordCount.java example to use HadoopFileSystemOptions 4. Since HDI + Spark cluster has ADLS as defaultFS, tried 2 things * Just gave the input path and output path as adl://home/sample.txt and adl://home/output * In addition to adl input and output path, also gave required HDFS configuration with adl required configs as well. Both didn't worked btw. s 1. Have checked ACL's and permissions. In fact similar job with same paths work on Spark directly. 2. Issues faced: * For input, Beam is not able to find the path. Console log: Filepattern adl://home/sample.txt matched 0 files with total size 0 * Output path always gets converted to relative path, something like this: /home/user1/adl:/home/output/.tmp.... Debugging more into this but was checking if someone is already facing this and has some resolution. Here is a sample code and command I used. HadoopFileSystemOptions options = PipelineOptionsFactory.fromArgs(args).as(HadoopFileSystemOptions.class); Pipeline p = Pipeline.create(options); p.apply( TextIO.read().from(options.getHdfsConfiguration().get(0).get("fs.defaultFS"))) .apply(new CountWords()) .apply(MapElements.via(new FormatAsTextFn())) .apply(TextIO.write().to("adl://home/output")); p.run().waitUntilFinish(); spark-submit --class org.apache.beam.examples.WordCount --master local beam-examples-java-2.3.0-SNAPSHOT.jar --runner=SparkRunner --hdfsConfiguration='[{\"fs.defaultFS\": \"hdfs://home/sample.txt\"]' P.S: Created fat jar to use with spark just for testing. Is there any other correct way of running it with Spark runner? -Milan.
-- Jean-Baptiste Onofré [email protected] http://blog.nanthrax.net Talend - http://www.talend.com
