In your example it seems as though your HDFS configuration doesn't contain
any ADL specific configuration:  "--hdfsConfiguration='[{\"fs.defaultFS\":
Do you have a core-site.xml or hdfs-site.xml configured as per:

>From the documentation for --hdfsConfiguration:
A list of Hadoop configurations used to configure zero or more Hadoop
filesystems. By default, Hadoop configuration is loaded from
'core-site.xml' and 'hdfs-site.xml based upon the HADOOP_CONF_DIR and
YARN_CONF_DIR environment variables. To specify configuration on the
command-line, represent the value as a JSON list of JSON maps, where each
map represents the entire configuration for a single Hadoop filesystem. For
example --hdfsConfiguration='[{\"\":
\"hdfs://localhost:9998\", ...},{\"\": \"s3a://\", ...},...]'

On Wed, Nov 22, 2017 at 1:12 AM, Jean-Baptiste Onofré <>

> Hi,
> FYI, I'm in touch with Microsoft Azure team about that.
> We are testing the ADLS support via HDFS.
> I keep you posted.
> Regards
> JB
> On 11/22/2017 09:12 AM, Milan Chandna wrote:
>> Hi,
>> Has anyone tried IO from(to) ADLS account on Beam with Spark runner?
>> I was trying recently to do this but was unable to make it work.
>> Steps that I tried:
>>    1.  Took HDI + Spark 1.6 cluster with default storage as ADLS account.
>>    2.  Built Apache Beam on that. Built to include Beam-2790<
>>> fix which earlier I was
>> facing for ADL as well.
>>    3.  Modified example to use HadoopFileSystemOptions
>>    4.  Since HDI + Spark cluster has ADLS as defaultFS, tried 2 things
>>       *   Just gave the input path and output path as
>> adl://home/sample.txt and adl://home/output
>>       *   In addition to adl input and output path, also gave required
>> HDFS configuration with adl required configs as well.
>> Both didn't worked btw.
>> s
>>    1.  Have checked ACL's and permissions. In fact similar job with same
>> paths work on Spark directly.
>>    2.  Issues faced:
>>       *   For input, Beam is not able to find the path. Console log:
>> Filepattern adl://home/sample.txt matched 0 files with total size 0
>>       *   Output path always gets converted to relative path, something
>> like this: /home/user1/adl:/home/output/.tmp....
>> Debugging more into this but was checking if someone is already facing
>> this and has some resolution.
>> Here is a sample code and command I used.
>>      HadoopFileSystemOptions options = PipelineOptionsFactory.fromArg
>> s(args).as(HadoopFileSystemOptions.class);
>>      Pipeline p = Pipeline.create(options);
>>      p.apply(
>> HdfsConfiguration().get(0).get("fs.defaultFS")))
>>       .apply(new CountWords())
>>       .apply(MapElements.via(new FormatAsTextFn()))
>>       .apply(TextIO.write().to("adl://home/output"));
>> spark-submit --class org.apache.beam.examples.WordCount --master local
>> beam-examples-java-2.3.0-SNAPSHOT.jar --runner=SparkRunner
>> --hdfsConfiguration='[{\"fs.defaultFS\": \"hdfs://home/sample.txt\"]'
>> P.S: Created fat jar to use with spark just for testing. Is there any
>> other correct way of running it with Spark runner?
>> -Milan.
