[ 
https://issues.apache.org/jira/browse/SPARK-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964972#comment-14964972
 ] 

Serge Smertin commented on SPARK-4368:
--------------------------------------

if it's decided to be hosted outside of project - is there any documented way 
to add new storage abstraction then?

> Ceph integration?
> -----------------
>
>                 Key: SPARK-4368
>                 URL: https://issues.apache.org/jira/browse/SPARK-4368
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>            Reporter: Serge Smertin
>
> There is a use-case of storing big number of relatively small BLOB objects 
> (2-20Mb), which has to have some ugly workarounds in HDFS environments. There 
> is a need to process those BLOBs close to data themselves, so that's why 
> MapReduce paradigm is good, as it guarantees data locality.
> Ceph seems to be one of the systems that maintains both of the properties 
> (small files and data locality) -  
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032119.html. I 
> know already that Spark supports GlusterFS - 
> http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccf657f2b.5b3a1%25ven...@yarcdata.com%3E
> So i wonder, could there be an integration with this storage solution and 
> what could be the effort of doing that? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to