Yes, I believe that is current behavior. Essentially, the first few RDDs will be partial windows (assuming window duration > sliding interval).
TD On Mon, Mar 24, 2014 at 1:12 PM, Adrian Mocanu <amoc...@verticalscope.com> wrote: > I have what I would call unexpected behaviour when using window on a stream. > > I have 2 windowed streams with a 5s batch interval. One window stream is > (5s,5s)=smallWindow and the other (10s,5s)=bigWindow > > What I've noticed is that the 1st RDD produced by bigWindow is incorrect and > is of the size 5s not 10s. So instead of waiting 10s and producing 1 RDD > with size 10s, Spark produced the 1st 10s RDD of size 5s. > > Why is this happening? To me it looks like a bug; Matei or TD can you verify > that this is correct behaviour? > > > > > > I have the following code > > val ssc = new StreamingContext(conf, Seconds(5)) > > > > val smallWindowStream = ssc.queueStream(smallWindowRddQueue) > > val bigWindowStream = ssc.queueStream(bigWindowRddQueue) > > > > val smallWindow = smallWindowReshapedStream.window(Seconds(5), Seconds(5)) > > .reduceByKey((t1, t2) => (t1._1, t1._2, t1._3 + t2._3)) > > val bigWindow = bigWindowReshapedStream.window(Seconds(10), Seconds(5)) > > .reduceByKey((t1, t2) => (t1._1, t1._2, t1._3 + t2._3)) > > > > -Adrian > >