Xiangrui Meng created SPARK-6717:
------------------------------------

             Summary: Clear shuffle files after checkpointing in ALS
                 Key: SPARK-6717
                 URL: https://issues.apache.org/jira/browse/SPARK-6717
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.3.1, 1.4.0
            Reporter: Xiangrui Meng
            Assignee: Xiangrui Meng


In ALS iterations, we checkpoint RDDs to cut lineage and to reduce shuffle 
files. However, whether to clean shuffle files depends on the system GC, which 
may not be triggered in ALS iterations. So after checkpointing, before we let 
the RDD object go out of scope, we should clean its shuffle dependencies 
explicitly. This function could either stay inside ALS or go to Core.

Without this feature, we can call System.gc() periodically to clean shuffle 
files of RDDs that went out of scope.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to