They are different, also this might be better suited for the user list. Persist by default will cache in memory on one machine, although you can specify a different storage level. Checkpoint on the other hand will write out to a persistent store and get rid of the dependency graph used to compute the RDD (so it is often seen in iterative algorithms which may build very large or complex dependency graphs over time).
On Saturday, April 30, 2016, Renyi Xiong <renyixio...@gmail.com> wrote: > Hi, > > Is RDD.persist equivalent to RDD.checkpoint If they save same number of > copies (say 3) to disk? > > (I assume persist saves copies on different machines ?) > > thanks, > Renyi. > > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau