Re: Accessing Hadoop2 HDFS from Spark app

Shivaram Venkataraman Mon, 17 Feb 2014 18:48:06 -0800

Thanks a lot Jey ! That fixes things. For reference I had to add the
following line to build.sbt


    case m if m.toLowerCase.matches("meta-inf/services.*$")  =>
MergeStrategy.concat

Should we also add this to Spark's assembly build ?

Thanks
Shivaram

On Mon, Feb 17, 2014 at 6:27 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote:
> We ran into this issue with ADAM, and it came down to an issue of not
> merging the "META-INF/services" files correctly. Here's the change we made
> to our Maven build files to fix it, can probably do something similar under
> SBT too:
> https://github.com/bigdatagenomics/adam/commit/b0997760b23c4284efe32eeb968ef2744af8be82
>
> -Jey
>
>
> On Mon, Feb 17, 2014 at 6:15 PM, Shivaram Venkataraman
> <shiva...@eecs.berkeley.edu> wrote:
>>
>> I ran into a weird bug today where trying to read a file from HDFS
>> built using Hadoop 2 gives an error saying "No FileSystem for scheme:
>> hdfs".  Specifically this only seems to happen when building an
>> assembly jar in the application and not when using sbt's run-main.
>>
>> The project's setup[0] is pretty simple and is only a slight
>> modification of the project used by the release audit tool. The sbt
>> assembly instructions[1] are mostly copied from Spark's sbt build
>> files.
>>
>> We run into this in SparkR as well, so it'll be great if anybody has
>> an idea on how to debug this.
>> To repoduce, you can do the following:
>>
>> 1. Launch a Spark EC2 cluster with 0.9.0 with --hadoop-major-version=2
>> 2. Clone https://github.com/shivaram/spark-utils
>> 3. Run release-audits/sbt_app_core/run-hdfs-test.sh
>>
>> Thanks
>> Shivaram
>>
>> [0]
>> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/src/main/scala/SparkHdfsApp.scala
>> [1]
>> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/build.sbt
>
>

Re: Accessing Hadoop2 HDFS from Spark app

Reply via email to