Re: [SPAM] Customized Aggregation Query on Spark SQL

2015-04-30 Thread Zhan Zhang
One optimization is to reduce the shuffle by first aggregate locally (only keep the max for each name), and then reduceByKey. Thanks. Zhan Zhang On Apr 24, 2015, at 10:03 PM, ayan guha guha.a...@gmail.commailto:guha.a...@gmail.com wrote: Here you go t =

Re: [SPAM] Customized Aggregation Query on Spark SQL

2015-04-30 Thread Wenlei Xie
Hi Zhan, How would this be achieved? Should the data be partitioned by name in this case? Thank you! Best, Wenlei On Thu, Apr 30, 2015 at 7:55 PM, Zhan Zhang zzh...@hortonworks.com wrote: One optimization is to reduce the shuffle by first aggregate locally (only keep the max for each