Gabriel Huang created SPARK-18361:
-------------------------------------
Summary: Expose RDD localCheckpoint in PySpark
Key: SPARK-18361
URL: https://issues.apache.org/jira/browse/SPARK-18361
Project: Spark
Issue Type: New Feature
Components: PySpark
Reporter: Gabriel Huang
As of today, I could not access rdd.localCheckpoint() in pyspark.
This is an important issue for machine learning people, as we often have to
iterate algorithms and perform operations like joins in each iteration.
If the lineage is not truncated, the memory usage, the lineage, and computation
time explode. rdd.localCheckpoint() seems like the most straightforward way of
truncating the lineage, but the python API does not expose it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]