Hi Alexey, SPARK-6479<https://issues.apache.org/jira/browse/SPARK-6479> is for the plugin API, and SPARK-6112<https://issues.apache.org/jira/browse/SPARK-6112> is for hdfs plugin.
Thanks. Zhan Zhang On Jul 21, 2015, at 10:56 AM, Alexey Goncharuk <alexey.goncha...@gmail.com<mailto:alexey.goncha...@gmail.com>> wrote: 2015-07-20 23:29 GMT-07:00 Matei Zaharia <matei.zaha...@gmail.com<mailto:matei.zaha...@gmail.com>>: I agree with this -- basically, to build on Reynold's point, you should be able to get almost the same performance by implementing either the Hadoop FileSystem API or the Spark Data Source API over Ignite in the right way. This would let people save data persistently in Ignite in addition to using it for caching, and it would provide a global namespace, optionally a schema, etc. You can still provide data locality, short-circuit reads, etc with these APIs. Absolutely agree. In fact, Ignite already provides a shared RDD implementation which is essentially a view of Ignite cache data. This implementation adheres to the Spark DataFrame API. More information can be found here: http://ignite.incubator.apache.org/features/igniterdd.html Also, Ignite in-memory filesystem is compliant with Hadoop filesystem API and can transparently replace HDFS if needed. Plugging it into Spark should be fairly easy. More information can be found here: http://ignite.incubator.apache.org/features/igfs.html --Alexey