Why do you need to do it, and can you just use a future in your driver code?
On Fri, Aug 7, 2020 at 9:01 AM Antonin Delpeuch (lists) <li...@antonin.delpeuch.eu> wrote: > > Hi all, > > Following my request on the user mailing list [1], there does not seem > to be any simple way to save RDDs to the file system in an asynchronous > way. I am looking into implementing this, so I am first checking whether > there is consensus around the idea. > > The goal would be to add methods such as `saveAsTextFileAsync` and > `saveAsObjectFileAsync` to the RDD API. > > I am thinking about doing this by: > > - refactoring SparkHadoopWriter to allow for submitting jobs > asynchronously (with `submitJob` rather than `runJob`) > > - add a `saveAsHadoopFileAsync` method in `PairRDDFunctions`, > counterpart to the existing `saveAsHadoopFile` > > - add a `saveAsTextFileAsync` (and other formats) in `AsyncRDDActions`. > > Because SparkHadoopWriter is private, it is complicated to reimplement > this functionality outside of Spark as a user, so I think this would be > an API worth offering. It should be possible to implement this without > too much code duplication hopefully. > > Cheers, > > Antonin > > [1]: > http://apache-spark-user-list.1001560.n3.nabble.com/Async-API-to-save-RDDs-td38320.html > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org