Gabriel Huang created SPARK-18361:
-------------------------------------

             Summary: Expose RDD localCheckpoint in PySpark
                 Key: SPARK-18361
                 URL: https://issues.apache.org/jira/browse/SPARK-18361
             Project: Spark
          Issue Type: New Feature
          Components: PySpark
            Reporter: Gabriel Huang


As of today, I could not access rdd.localCheckpoint() in pyspark.

This is an important issue for machine learning people, as we often have to 
iterate algorithms and perform operations like joins in each iteration. 

If the lineage is not truncated, the memory usage, the lineage, and computation 
time explode. rdd.localCheckpoint()  seems like the most straightforward way of 
truncating the lineage, but the python API does not expose it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to