[jira] [Created] (SPARK-4368) Ceph integration?

Serge Smertin (JIRA) Wed, 12 Nov 2014 09:06:11 -0800

Serge Smertin created SPARK-4368:
------------------------------------

             Summary: Ceph integration?
                 Key: SPARK-4368
                 URL: https://issues.apache.org/jira/browse/SPARK-4368
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
            Reporter: Serge Smertin



There is a use-case of storing big number of relatively small BLOB objects 
(2-20Mb), which has to have some ugly workarounds in HDFS environments. There 
is a need to process those BLOBs close to data themselves, so that's why 
MapReduce paradigm is good, as it guarantees data locality.

Ceph seems to be one of the systems that maintains both of the properties 
(small files and data locality) -  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032119.html. I 
know already that Spark supports GlusterFS - 
http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccf657f2b.5b3a1%[email protected]%3E

So i wonder, could there be an integration with this storage solution and what 
could be the effort of doing that? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-4368) Ceph integration?

Reply via email to