It seems what Christopher said makes certain sense, because this round's
RDD depends on last round's RDD, so as time goes by, it would grow
infinitely.

I realize that the streaming/examples/clickstream/PageViewStream.scala in
code base is not what figure 3 in paper describes, so I have no idea what
application figure 3 is talking about.

Mark, sorry I don't quite understand what you've said.

thanks,
dachuan.


On Sat, Nov 2, 2013 at 4:35 PM, Mark Hamstra <m...@clearstorydata.com>wrote:

> You're coming at the paper from a different context than that in which it
> was written.  The paper doesn't claim that RDD lineage and state could grow
> indefinitely after the Spark Streaming changes were made.  That growth was
> indefinite in early, pre-Streaming versions of Spark, however.
>
>
>
> On Sat, Nov 2, 2013 at 7:51 AM, dachuan <hdc1...@gmail.com> wrote:
>
> > Hi, developers,
> >
> > I found this sentence hard to understand, it's from sosp'13 spark
> streaming
> > paper:
> >
> > "Lineage cutoff: Because lineage graphs between RDDs
> > in D-Streams can grow indefinitely, we modified the
> > scheduler to forget lineage after an RDD has been checkpointed,
> > so that its state does not grow arbitrarily."
> >
> > In my personal understanding, the length of DStream chain is fixed, so
> the
> > RDDs these DStreams generate also have fixed length. Besides, the RDDs
> > don't depend on the RDDs in the previous round. So why does it claim that
> > lineage graph can grow indefinitely? when you say "grow indefinitely", do
> > you refer to lineage graph's width or the number of lineage graphs?
> >
> > thanks,
> > dachuan.
> >
>



-- 
Dachuan Huang
Cellphone: 614-390-7234
2015 Neil Avenue
Ohio State University
Columbus, Ohio
U.S.A.
43210

Reply via email to