Weichen Xu created SPARK-19215:

             Summary: Add necessary check for `RDD.checkpoint` to avoid 
potential mistakes
                 Key: SPARK-19215
                 URL: https://issues.apache.org/jira/browse/SPARK-19215
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Weichen Xu

Currently RDD.checkpoint must be called before any job executed on this RDD, 
otherwise the `doCheckpoint` will never be called. This is a pitfall we should 
check this and throw exception (or at least log warning ? ) for such case.
And, if RDD haven't been persisted, doing checkpoint will cause RDD 
recomputation, because current implementation will run separated job for 
checkpointing. I think such case it should also print some warning message, 
remind user to check whether he forgot persist the RDD.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to