Re: Spark - Hadoop custom filesystem service loading

Felix Cheung Sat, 23 Mar 2019 20:28:49 -0700

Hmm thanks. Do you have a proposed solution?

________________________________
From: Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com>
Sent: Monday, March 18, 2019 1:24 PM
To: user
Subject: Spark - Hadoop custom filesystem service loading

Hi everyone,

On spark 2.2.0, if you wanted to create a custom file system implementation, 
you just created an extension of org.apache.hadoop.fs.FileSystem and put the 
canonical name of the custom class on the file 
src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem.

Once you imported that jar dependency on your spark submit application, the 
custom schema was automatically loaded, and you could start to use it just like 
ds.load("customfs://path").

But on spark 2.4.0 that does not seem to work the same. If you do exactly the 
same you will get an error like "No FileSystem for customfs".

The only way I achieved this on 2.4.0, was specifying the spark property 
spark.hadoop.fs.customfs.impl.

Do you guys consider this as a bug? or is it an intentional change that should 
be documented on somewhere?

Btw, digging a little bit on this, it seems that the cause is that now the 
FileSystem is initialized before the actual dependencies are downloaded from 
Maven repo (see 
here<https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L66>).
 And as that initialization loads the available filesystems at that point and 
only once, the filesystems in the jars downloaded are not taken in account.

Thanks.

Re: Spark - Hadoop custom filesystem service loading

Reply via email to