On 21 Nov 2016, at 17:26, Samy Dindane 
<s...@dindane.com<mailto:s...@dindane.com>> wrote:

Hi,

I'd like to extend the file:// file system and add some custom logic to the API 
that lists files.
I think I need to extend FileSystem or LocalFileSystem from 
org.apache.hadoop.fs, but I am not sure how to go about it exactly.


subclass it, then declare it as a

spark.hadoop.fs.samy.impl=SamyFSClass

to use urls like samy://home/files/data/*


You can also rebind file:// to point to your new fs, by overrriding fs.file.impl

There's a fairly formal definition of what a filesystem is meant to do

https://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-common/filesystem/filesystem.html

and lots of contract tests on each of the operations; you can find them al in 
the Hadoop-common/tests source tree...if you are thinking of doing anything 
non-trivial with filesystems, get these tests working before you start changing 
things. But know that as these tests don't generate load or concurrent 
requests, aren't sufficient to say that stuff works, only identify when it is 
broken at a basic level.


 GlusterFS comes from Redhat, they've got a connector which works with Hadoop & 
Spark code. Have you used it?


How to write a custom file system and make it usable by Spark?

Thank you,

Samy

---------------------------------------------------------------------
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>



Reply via email to