It seems what Christopher said makes certain sense, because this round's RDD depends on last round's RDD, so as time goes by, it would grow infinitely.
I realize that the streaming/examples/clickstream/PageViewStream.scala in code base is not what figure 3 in paper describes, so I have no idea what application figure 3 is talking about. Mark, sorry I don't quite understand what you've said. thanks, dachuan. On Sat, Nov 2, 2013 at 4:35 PM, Mark Hamstra <m...@clearstorydata.com>wrote: > You're coming at the paper from a different context than that in which it > was written. The paper doesn't claim that RDD lineage and state could grow > indefinitely after the Spark Streaming changes were made. That growth was > indefinite in early, pre-Streaming versions of Spark, however. > > > > On Sat, Nov 2, 2013 at 7:51 AM, dachuan <hdc1...@gmail.com> wrote: > > > Hi, developers, > > > > I found this sentence hard to understand, it's from sosp'13 spark > streaming > > paper: > > > > "Lineage cutoff: Because lineage graphs between RDDs > > in D-Streams can grow indefinitely, we modified the > > scheduler to forget lineage after an RDD has been checkpointed, > > so that its state does not grow arbitrarily." > > > > In my personal understanding, the length of DStream chain is fixed, so > the > > RDDs these DStreams generate also have fixed length. Besides, the RDDs > > don't depend on the RDDs in the previous round. So why does it claim that > > lineage graph can grow indefinitely? when you say "grow indefinitely", do > > you refer to lineage graph's width or the number of lineage graphs? > > > > thanks, > > dachuan. > > > -- Dachuan Huang Cellphone: 614-390-7234 2015 Neil Avenue Ohio State University Columbus, Ohio U.S.A. 43210