---------- Forwarded message ----------
From: Priya PM <pmpr...@gmail.com>
Date: Fri, May 26, 2017 at 8:54 PM
Subject: Re: Spark checkpoint - nonstreaming
To: Jörn Franke <jornfra...@gmail.com>


Oh, how do i do it. I dont see it mentioned anywhere in the documentation.

I have followed this link https://github.com/JerryLead/SparkInternals/blob/
master/markdown/english/6-CacheAndCheckpoint.md to understand checkpoint
work flow.

But it doesnt seem to work the way it was mentioned below during the second
run to read from checkpointed RDD.



*Q: How to read checkpointed RDD ?*

runJob() will call finalRDD.partitions() to determine how many tasks there
will be. rdd.partitions() checks if the RDD has been checkpointed via
RDDCheckpointData which manages checkpointed RDD. If yes, return the
partitions of the RDD (Array[Partition]). When rdd.iterator() is called to
compute RDD's partition, computeOrReadCheckpoint(split: Partition) is also
called to check if the RDD is checkpointed. If yes, the parent RDD's
iterator(), a.k.a CheckpointRDD.iterator() will be called. CheckpointRDD
reads files on file system to produce RDD partition. *That's why a parent *
*CheckpointRDD** is added to checkpointed rdd trickly*

On Fri, May 26, 2017 at 8:48 PM, Jörn Franke <jornfra...@gmail.com> wrote:

> Did you explicitly tell the application to read from the checkpoint
> directory ?
> This you have to do in non-streaming scenarios.
>
> On 26. May 2017, at 16:52, Priya PM <pmpr...@gmail.com> wrote:
>
> yes, i did set the checkpoint directory. I could see the checkpointed RDD
> too.
>
> [root@ rdd-28]# pwd
> /root/checkpointDir/9dd1acf0-bef8-4a4f-bf0e-f7624334abc5/rdd-28
>
> I am using the MovieLens application to check spark checkpointing feature.
>
> code: MovieLensALS.scala
>
> def main(args: Array[String]) {
> ..
> ..
> sc.setCheckpointDir("/root/checkpointDir")
> }
>
>
>
> On Fri, May 26, 2017 at 8:09 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Do you have some source code?
>> Did you set the checkpoint directory ?
>>
>> > On 26. May 2017, at 16:06, Priya <pmpr...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > With nonstreaming spark application, did checkpoint the RDD and I could
>> see
>> > the RDD getting checkpointed. I have killed the application after
>> > checkpointing the RDD and restarted the same application again
>> immediately,
>> > but it doesn't seem to pick from checkpoint and it again checkpoints the
>> > RDD. Could anyone please explain why am I seeing this behavior, why it
>> is
>> > not picking from the checkpoint and proceeding further from there on the
>> > second run of the same application. Would really help me understand
>> spark
>> > checkpoint work flow if I can get some clarity on the behavior. Please
>> let
>> > me know if I am missing something.
>> >
>> > [root@checkpointDir]# ls
>> > 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5  a4f14f43-e7c3-4f64-a980-8483b4
>> 2bb11d
>> >
>> > [root@9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# ls -la
>> > total 0
>> > drwxr-xr-x. 3 root root  20 May 26 16:26 .
>> > drwxr-xr-x. 4 root root  94 May 26 16:24 ..
>> > drwxr-xr-x. 2 root root 133 May 26 16:26 rdd-28
>> >
>> > [root@priya-vm 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# cd rdd-28/
>> > [root@priya-vm rdd-28]# ls
>> > part-00000  part-00001  _partitioner
>> >
>> > Thanks
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Spark-checkpoint-nonstreaming-tp28712.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com
>> .
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >
>>
>
>

Reply via email to