Re: Accessing Hadoop2 HDFS from Spark app

Patrick Wendell Mon, 17 Feb 2014 21:28:20 -0800

BTW my fix in Spark was later generalized to be equivalent to what you
did, which is do this for the entire services directory rather than
just FileSystem.


On Mon, Feb 17, 2014 at 9:26 PM, Patrick Wendell <pwend...@gmail.com> wrote:
> Ya I ran into this a few months ago. We actually patched the spark
> build back then. It took me a long time to figure it out.
>
> https://github.com/apache/incubator-spark/commit/0c1985b153a2dc2c891ae61c1ee67506926384ae
>
> On Mon, Feb 17, 2014 at 6:47 PM, Shivaram Venkataraman
> <shiva...@eecs.berkeley.edu> wrote:
>> Thanks a lot Jey ! That fixes things. For reference I had to add the
>> following line to build.sbt
>>
>>     case m if m.toLowerCase.matches("meta-inf/services.*$")  =>
>> MergeStrategy.concat
>>
>> Should we also add this to Spark's assembly build ?
>>
>> Thanks
>> Shivaram
>>
>> On Mon, Feb 17, 2014 at 6:27 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote:
>>> We ran into this issue with ADAM, and it came down to an issue of not
>>> merging the "META-INF/services" files correctly. Here's the change we made
>>> to our Maven build files to fix it, can probably do something similar under
>>> SBT too:
>>> https://github.com/bigdatagenomics/adam/commit/b0997760b23c4284efe32eeb968ef2744af8be82
>>>
>>> -Jey
>>>
>>>
>>> On Mon, Feb 17, 2014 at 6:15 PM, Shivaram Venkataraman
>>> <shiva...@eecs.berkeley.edu> wrote:
>>>>
>>>> I ran into a weird bug today where trying to read a file from HDFS
>>>> built using Hadoop 2 gives an error saying "No FileSystem for scheme:
>>>> hdfs".  Specifically this only seems to happen when building an
>>>> assembly jar in the application and not when using sbt's run-main.
>>>>
>>>> The project's setup[0] is pretty simple and is only a slight
>>>> modification of the project used by the release audit tool. The sbt
>>>> assembly instructions[1] are mostly copied from Spark's sbt build
>>>> files.
>>>>
>>>> We run into this in SparkR as well, so it'll be great if anybody has
>>>> an idea on how to debug this.
>>>> To repoduce, you can do the following:
>>>>
>>>> 1. Launch a Spark EC2 cluster with 0.9.0 with --hadoop-major-version=2
>>>> 2. Clone https://github.com/shivaram/spark-utils
>>>> 3. Run release-audits/sbt_app_core/run-hdfs-test.sh
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>> [0]
>>>> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/src/main/scala/SparkHdfsApp.scala
>>>> [1]
>>>> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/build.sbt
>>>
>>>

Re: Accessing Hadoop2 HDFS from Spark app

Reply via email to