Serge Smertin created SPARK-4368:
------------------------------------
Summary: Ceph integration?
Key: SPARK-4368
URL: https://issues.apache.org/jira/browse/SPARK-4368
Project: Spark
Issue Type: Bug
Components: Input/Output
Reporter: Serge Smertin
There is a use-case of storing big number of relatively small BLOB objects
(2-20Mb), which has to have some ugly workarounds in HDFS environments. There
is a need to process those BLOBs close to data themselves, so that's why
MapReduce paradigm is good, as it guarantees data locality.
Ceph seems to be one of the systems that maintains both of the properties
(small files and data locality) -
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032119.html. I
know already that Spark supports GlusterFS -
http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccf657f2b.5b3a1%[email protected]%3E
So i wonder, could there be an integration with this storage solution and what
could be the effort of doing that?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]