Re: Async RDD saves

Sean Owen Fri, 07 Aug 2020 08:38:20 -0700

Why do you need to do it, and can you just use a future in your driver code?


On Fri, Aug 7, 2020 at 9:01 AM Antonin Delpeuch (lists)
<li...@antonin.delpeuch.eu> wrote:
>
> Hi all,
>
> Following my request on the user mailing list [1], there does not seem
> to be any simple way to save RDDs to the file system in an asynchronous
> way. I am looking into implementing this, so I am first checking whether
> there is consensus around the idea.
>
> The goal would be to add methods such as `saveAsTextFileAsync` and
> `saveAsObjectFileAsync` to the RDD API.
>
> I am thinking about doing this by:
>
> - refactoring SparkHadoopWriter to allow for submitting jobs
> asynchronously (with `submitJob` rather than `runJob`)
>
> - add a `saveAsHadoopFileAsync` method in `PairRDDFunctions`,
> counterpart to the existing `saveAsHadoopFile`
>
> - add a `saveAsTextFileAsync` (and other formats) in `AsyncRDDActions`.
>
> Because SparkHadoopWriter is private, it is complicated to reimplement
> this functionality outside of Spark as a user, so I think this would be
> an API worth offering. It should be possible to implement this without
> too much code duplication hopefully.
>
> Cheers,
>
> Antonin
>
> [1]:
> http://apache-spark-user-list.1001560.n3.nabble.com/Async-API-to-save-RDDs-td38320.html
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Async RDD saves

Reply via email to