Hello everyone,

I would like to initiate discussion about integration Apache Spark and 
Openstack Swift. 
(https://issues.apache.org/jira/browse/SPARK-938 was created while ago)

I created a patch (https://github.com/apache/spark/pull/1010) that 
provides initial information how to connect Swift and Spark. Currently it 
uses Hadoop 2.3.0 and only stand alone mode of Spark. This patch is mainly 
used to provide community a way to experiment with this integration.
I have it fully working on my private cluster and it works very well, 
allowing me to make various analytics using Spark.

My next planned patches will include information how to configure Swift 
for other cluster deployment of Spark and also information how to 
integrate Spark and Swift with earlier versions of Hadoop. 
I am confident that the integration between Spark and Swift is very 
important future that will  benefit greatly for the exposure of Spark.

The integration between Spark and Swift is very similar to how Spark 
integrates with S3.

Will be great to hear comments / suggestions / remarks from the community!

All the best,
Gil Vernik.

Reply via email to