I would just like to be able to put a Spark DataFrame in a manager.dict() and be able to get it out (manager.dict() calls pickle on the object being stored). Ideally, I would just like to store a pointer to the DataFrame object so that it remains distributed within Spark (i.e., not materialize and then store). Here is an example:
data = sparkContext.jsonFile(data_file) #load file cache = Manager.dict() #thread-safe container cache['id'] = data #store reference to data, not materialized result new_data = cache['id'] #get reference to distributed spark dataframe new_data.show() -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Pickle-Spark-DataFrame-tp14803p14825.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
