Hi all, Following my request on the user mailing list [1], there does not seem to be any simple way to save RDDs to the file system in an asynchronous way. I am looking into implementing this, so I am first checking whether there is consensus around the idea.
The goal would be to add methods such as `saveAsTextFileAsync` and `saveAsObjectFileAsync` to the RDD API. I am thinking about doing this by: - refactoring SparkHadoopWriter to allow for submitting jobs asynchronously (with `submitJob` rather than `runJob`) - add a `saveAsHadoopFileAsync` method in `PairRDDFunctions`, counterpart to the existing `saveAsHadoopFile` - add a `saveAsTextFileAsync` (and other formats) in `AsyncRDDActions`. Because SparkHadoopWriter is private, it is complicated to reimplement this functionality outside of Spark as a user, so I think this would be an API worth offering. It should be possible to implement this without too much code duplication hopefully. Cheers, Antonin [1]: http://apache-spark-user-list.1001560.n3.nabble.com/Async-API-to-save-RDDs-td38320.html --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org