Accessing Hadoop2 HDFS from Spark app

Shivaram Venkataraman Mon, 17 Feb 2014 18:16:25 -0800

I ran into a weird bug today where trying to read a file from HDFS
built using Hadoop 2 gives an error saying "No FileSystem for scheme:
hdfs".  Specifically this only seems to happen when building an
assembly jar in the application and not when using sbt's run-main.


The project's setup[0] is pretty simple and is only a slight
modification of the project used by the release audit tool. The sbt
assembly instructions[1] are mostly copied from Spark's sbt build
files.

We run into this in SparkR as well, so it'll be great if anybody has
an idea on how to debug this.
To repoduce, you can do the following:

1. Launch a Spark EC2 cluster with 0.9.0 with --hadoop-major-version=2
2. Clone https://github.com/shivaram/spark-utils
3. Run release-audits/sbt_app_core/run-hdfs-test.sh

Thanks
Shivaram

[0] 
https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/src/main/scala/SparkHdfsApp.scala
[1] 
https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/build.sbt

Accessing Hadoop2 HDFS from Spark app

Reply via email to