Re: Issue while trying to aggregate with a sliding window

2014-06-19 Thread Tathagata Das
zeroTime marks the time when the streaming job started, and the first batch of data is from zeroTime to zeroTime + slideDuration. The validity check of time - zeroTime) being multiple of slideDuration is to ensure that for a given dstream, it generates RDD at the right times. For example, say the

Re: Issue while trying to aggregate with a sliding window

2014-06-18 Thread Hatch M
Ok that patch does fix the key lookup exception. However, curious about the time validity check..isValidTime ( https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L264 ) Why does (time - zerotime) have to be a multiple of slide

Issue while trying to aggregate with a sliding window

2014-06-17 Thread Hatch M
Trying to aggregate over a sliding window, playing with the slide duration. Playing around with the slide interval I can see the aggregation works but mostly fails with the below error. The stream has records coming in at 100ms. JavaPairDStreamString, AggregateObject aggregatedDStream =

Re: Issue while trying to aggregate with a sliding window

2014-06-17 Thread onpoq l
There is a bug: https://github.com/apache/spark/pull/961#issuecomment-45125185 On Tue, Jun 17, 2014 at 8:19 PM, Hatch M hatchman1...@gmail.com wrote: Trying to aggregate over a sliding window, playing with the slide duration. Playing around with the slide interval I can see the aggregation

Re: Issue while trying to aggregate with a sliding window

2014-06-17 Thread Hatch M
Thanks! Will try to get the fix and retest. On Tue, Jun 17, 2014 at 5:30 PM, onpoq l onpo...@gmail.com wrote: There is a bug: https://github.com/apache/spark/pull/961#issuecomment-45125185 On Tue, Jun 17, 2014 at 8:19 PM, Hatch M hatchman1...@gmail.com wrote: Trying to aggregate over a