zeroTime marks the time when the streaming job started, and the first batch
of data is from zeroTime to zeroTime + slideDuration. The validity check of
time - zeroTime) being multiple of slideDuration is to ensure that for a
given dstream, it generates RDD at the right times. For example, say the
Ok that patch does fix the key lookup exception. However, curious about the
time validity check..isValidTime (
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L264
)
Why does (time - zerotime) have to be a multiple of slide
Trying to aggregate over a sliding window, playing with the slide duration.
Playing around with the slide interval I can see the aggregation works but
mostly fails with the below error. The stream has records coming in at
100ms.
JavaPairDStreamString, AggregateObject aggregatedDStream =
There is a bug:
https://github.com/apache/spark/pull/961#issuecomment-45125185
On Tue, Jun 17, 2014 at 8:19 PM, Hatch M hatchman1...@gmail.com wrote:
Trying to aggregate over a sliding window, playing with the slide duration.
Playing around with the slide interval I can see the aggregation
Thanks! Will try to get the fix and retest.
On Tue, Jun 17, 2014 at 5:30 PM, onpoq l onpo...@gmail.com wrote:
There is a bug:
https://github.com/apache/spark/pull/961#issuecomment-45125185
On Tue, Jun 17, 2014 at 8:19 PM, Hatch M hatchman1...@gmail.com wrote:
Trying to aggregate over a