[
https://issues.apache.org/jira/browse/SPARK-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-18361:
------------------------------------
Assignee: Apache Spark
> Expose RDD localCheckpoint in PySpark
> -------------------------------------
>
> Key: SPARK-18361
> URL: https://issues.apache.org/jira/browse/SPARK-18361
> Project: Spark
> Issue Type: New Feature
> Components: PySpark
> Reporter: Gabriel Huang
> Assignee: Apache Spark
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> As of today, I could not access rdd.localCheckpoint() in pyspark.
> This is an important issue for machine learning people, as we often have to
> iterate algorithms and perform operations like joins in each iteration.
> If the lineage is not truncated, the memory usage, the lineage, and
> computation time explode. rdd.localCheckpoint() seems like the most
> straightforward way of truncating the lineage, but the python API does not
> expose it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]