Why are you repartitioning 1? That would obviously be slow, you are
converting a distributed operation to a single node operation.
Also consider using RDD.top(). If you define the ordering right (based on
the count), then you will get top K across then without doing a shuffle for
sortByKey. Much
Hi,
I have a streaming application where am doing top 10 count in each window which
seems slow. Is there efficient way to do this.
val counts = keyAndValues.map(x =
math.round(x._3.toDouble)).countByValueAndWindow(Seconds(4), Seconds(4))
val topCounts =