Re: Pickle Spark DataFrame

Justin Uang Tue, 03 Nov 2015 13:19:14 -0800

Is the Manager a python multiprocessing manager? Why are you using
parallelism on python when theoretically most of the heavy lifting is done
via spark?


On Wed, Oct 28, 2015 at 4:27 PM agg212 <[email protected]> wrote:

> I would just like to be able to put a Spark DataFrame in a manager.dict()
> and
> be able to get it out (manager.dict() calls pickle on the object being
> stored).  Ideally, I would just like to store a pointer to the DataFrame
> object so that it remains distributed within Spark (i.e., not materialize
> and then store).  Here is an example:
>
> data = sparkContext.jsonFile(data_file) #load file
> cache = Manager.dict() #thread-safe container
> cache['id'] = data #store reference to data, not materialized result
> new_data = cache['id'] #get reference to distributed spark dataframe
> new_data.show()
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Pickle-Spark-DataFrame-tp14803p14825.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Pickle Spark DataFrame

Reply via email to