Hi Benjarmin, Can you describe which step of group by is slow? Mapper side or reducer side?
What's your query like? Can you share it? Do you call any algebraic UDF after group by? I am wondering whether combiner matters in your test. Thanks, Cheolsoo On Tue, Aug 20, 2013 at 2:27 AM, Benjamin Jakobus <[email protected]>wrote: > Hi all, > > After benchmarking Hive and Pig, I found that the Group By operator in Pig > is drastically slower that Hive's. I was wondering whether anybody has > experienced the same? And whether people may have any tips for improving > the performance of this operation? (Adding a DISTINCT as suggested by an > earlier post on here doesn't help. I am currently re-running the benchmark > with LZO compression enabled). > > Regards, > Ben >
