2015-07-20 23:29 GMT-07:00 Matei Zaharia <matei.zaha...@gmail.com>: > I agree with this -- basically, to build on Reynold's point, you should be > able to get almost the same performance by implementing either the Hadoop > FileSystem API or the Spark Data Source API over Ignite in the right way. > This would let people save data persistently in Ignite in addition to using > it for caching, and it would provide a global namespace, optionally a > schema, etc. You can still provide data locality, short-circuit reads, etc > with these APIs. >
Absolutely agree. In fact, Ignite already provides a shared RDD implementation which is essentially a view of Ignite cache data. This implementation adheres to the Spark DataFrame API. More information can be found here: http://ignite.incubator.apache.org/features/igniterdd.html Also, Ignite in-memory filesystem is compliant with Hadoop filesystem API and can transparently replace HDFS if needed. Plugging it into Spark should be fairly easy. More information can be found here: http://ignite.incubator.apache.org/features/igfs.html --Alexey