Re: Accessing Hadoop2 HDFS from Spark app

Jey Kottalam Mon, 17 Feb 2014 18:29:23 -0800

We ran into this issue with ADAM, and it came down to an issue of not
merging the "META-INF/services" files correctly. Here's the change we made
to our Maven build files to fix it, can probably do something similar under
SBT too:
https://github.com/bigdatagenomics/adam/commit/b0997760b23c4284efe32eeb968ef2744af8be82


-Jey


On Mon, Feb 17, 2014 at 6:15 PM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> I ran into a weird bug today where trying to read a file from HDFS
> built using Hadoop 2 gives an error saying "No FileSystem for scheme:
> hdfs".  Specifically this only seems to happen when building an
> assembly jar in the application and not when using sbt's run-main.
>
> The project's setup[0] is pretty simple and is only a slight
> modification of the project used by the release audit tool. The sbt
> assembly instructions[1] are mostly copied from Spark's sbt build
> files.
>
> We run into this in SparkR as well, so it'll be great if anybody has
> an idea on how to debug this.
> To repoduce, you can do the following:
>
> 1. Launch a Spark EC2 cluster with 0.9.0 with --hadoop-major-version=2
> 2. Clone https://github.com/shivaram/spark-utils
> 3. Run release-audits/sbt_app_core/run-hdfs-test.sh
>
> Thanks
> Shivaram
>
> [0]
> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/src/main/scala/SparkHdfsApp.scala
> [1]
> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/build.sbt
>

Re: Accessing Hadoop2 HDFS from Spark app

Reply via email to