One optimization is to reduce the shuffle by first aggregate locally (only keep
the max for each name), and then reduceByKey.
Thanks.
Zhan Zhang
On Apr 24, 2015, at 10:03 PM, ayan guha
guha.a...@gmail.commailto:guha.a...@gmail.com wrote:
Here you go
t =
Hi Zhan,
How would this be achieved? Should the data be partitioned by name in this
case?
Thank you!
Best,
Wenlei
On Thu, Apr 30, 2015 at 7:55 PM, Zhan Zhang zzh...@hortonworks.com wrote:
One optimization is to reduce the shuffle by first aggregate locally
(only keep the max for each