Hi Alexey,

SPARK-6479<https://issues.apache.org/jira/browse/SPARK-6479> is for the plugin 
API, and SPARK-6112<https://issues.apache.org/jira/browse/SPARK-6112> is for 
hdfs plugin.


Thanks.

Zhan Zhang

On Jul 21, 2015, at 10:56 AM, Alexey Goncharuk 
<alexey.goncha...@gmail.com<mailto:alexey.goncha...@gmail.com>> wrote:


2015-07-20 23:29 GMT-07:00 Matei Zaharia 
<matei.zaha...@gmail.com<mailto:matei.zaha...@gmail.com>>:
I agree with this -- basically, to build on Reynold's point, you should be able 
to get almost the same performance by implementing either the Hadoop FileSystem 
API or the Spark Data Source API over Ignite in the right way. This would let 
people save data persistently in Ignite in addition to using it for caching, 
and it would provide a global namespace, optionally a schema, etc. You can 
still provide data locality, short-circuit reads, etc with these APIs.

Absolutely agree.

In fact, Ignite already provides a shared RDD implementation which is 
essentially a view of Ignite cache data. This implementation adheres to the 
Spark DataFrame API. More information can be found here: 
http://ignite.incubator.apache.org/features/igniterdd.html

Also, Ignite in-memory filesystem is compliant with Hadoop filesystem API and 
can transparently replace HDFS if needed. Plugging it into Spark should be 
fairly easy. More information can be found here: 
http://ignite.incubator.apache.org/features/igfs.html

--Alexey


Reply via email to