Re: Efficient Top count in each window

2015-03-12 Thread Tathagata Das
Why are you repartitioning 1? That would obviously be slow, you are converting a distributed operation to a single node operation. Also consider using RDD.top(). If you define the ordering right (based on the count), then you will get top K across then without doing a shuffle for sortByKey. Much

Efficient Top count in each window

2015-03-12 Thread Laeeq Ahmed
Hi,  I have a streaming application where am doing top 10 count in each window which seems slow. Is there efficient way to do this. val counts = keyAndValues.map(x = math.round(x._3.toDouble)).countByValueAndWindow(Seconds(4), Seconds(4))        val topCounts =