Hi all, After benchmarking Hive and Pig, I found that the Group By operator in Pig is drastically slower that Hive's. I was wondering whether anybody has experienced the same? And whether people may have any tips for improving the performance of this operation? (Adding a DISTINCT as suggested by an earlier post on here doesn't help. I am currently re-running the benchmark with LZO compression enabled).
Regards, Ben
