Re: Issue while trying to aggregate with a sliding window

2014-06-17 Thread onpoq l
There is a bug:

https://github.com/apache/spark/pull/961#issuecomment-45125185


On Tue, Jun 17, 2014 at 8:19 PM, Hatch M hatchman1...@gmail.com wrote:
 Trying to aggregate over a sliding window, playing with the slide duration.
 Playing around with the slide interval I can see the aggregation works but
 mostly fails with the below error. The stream has records coming in at
 100ms.

 JavaPairDStreamString, AggregateObject aggregatedDStream =
 pairsDStream.reduceByKeyAndWindow(aggFunc, new Duration(6), new
 Duration(60));

 14/06/18 00:14:46 INFO dstream.ShuffledDStream: Time 1403050486900 ms is
 invalid as zeroTime is 1403050485800 ms and slideDuration is 6 ms and
 difference is 1100 ms
 14/06/18 00:14:46 ERROR actor.OneForOneStrategy: key not found:
 1403050486900 ms
 java.util.NoSuchElementException: key not found: 1403050486900 ms
 at scala.collection.MapLike$class.default(MapLike.scala:228)

 Any hints on whats going on here?
 Thanks!
 Hatch



shuffling using netty in spark streaming

2014-06-12 Thread onpoq l
Hi,

1. Does netty perform better than the basic method for shuffling? I found
the latency caused by shuffling in a streaming job is not stable with the
basic method.

2. However, after I turn on netty for shuffling, I can only see the results
for the first two batches, and then no output at all. I'm not sure whether
the way I turn on netty is correct:

val conf = new SparkConf().set(spark.shuffle.use.netty, true)

Thanks.

Boduo Li