Re: checkpointing without streaming?

2017-05-18 Thread Tathagata Das
y = sc.parallelize(x ,2).map( c => c*2) >>> y.checkpoint >>> y.count >>> >>> Is it possible to read the checkpointed RDD in another application? >>> >>> >>> >>> >>

Re: checkpointing without streaming?

2017-05-18 Thread Neelesh Sambhajiche
al x = List(1,2,3,4) >> val y = sc.parallelize(x ,2).map( c => c*2) >> y.checkpoint >> y.count >> >> Is it possible to read the checkpointed RDD in another application? >> >> >> >> >> >> -- >> View this message in context: http

Re: checkpointing without streaming?

2017-05-17 Thread Tathagata Das
ion? > > > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/checkpointing-without-streaming-tp4541p28691.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ---

Re: checkpointing without streaming?

2017-05-17 Thread neelesh.sa
RDD in another application? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/checkpointing-without-streaming-tp4541p28691.html Sent from the Apache Spark User List mailing list archive at Nabble.

checkpointing without streaming?

2014-04-21 Thread Diana Carroll
I'm trying to understand when I would want to checkpoint an RDD rather than just persist to disk. Every reference I can find to checkpoint related to Spark Streaming. But the method is defined in the core Spark library, not Streaming. Does it exist solely for streaming, or are there

Re: checkpointing without streaming?

2014-04-21 Thread Xiangrui Meng
Checkpoint clears dependencies. You might need checkpoint to cut a long lineage in iterative algorithms. -Xiangrui On Mon, Apr 21, 2014 at 11:34 AM, Diana Carroll dcarr...@cloudera.com wrote: I'm trying to understand when I would want to checkpoint an RDD rather than just persist to disk.

Re: checkpointing without streaming?

2014-04-21 Thread Diana Carroll
When might that be necessary or useful? Presumably I can persist and replicate my RDD to avoid re-computation, if that's my goal. What advantage does checkpointing provide over disk persistence with replication? On Mon, Apr 21, 2014 at 2:42 PM, Xiangrui Meng men...@gmail.com wrote:

Re: checkpointing without streaming?

2014-04-21 Thread Tathagata Das
Diana, that is a good question. When you persist an RDD, the system still remembers the whole lineage of parent RDDs that created that RDD. If one of the executor fails, and the persist data is lost (both local disk and memory data will get lost), then the lineage is used to recreate the RDD. The