Re: Slow Group By operator

Cheolsoo Park Tue, 20 Aug 2013 11:58:11 -0700

Hi Benjarmin,

Can you describe which step of group by is slow? Mapper side or reducer
side?

What's your query like? Can you share it? Do you call any algebraic UDF
after group by? I am wondering whether combiner matters in your test.

Thanks,
Cheolsoo

On Tue, Aug 20, 2013 at 2:27 AM, Benjamin Jakobus <[email protected]>wrote:

> Hi all,
>
> After benchmarking Hive and Pig, I found that the Group By operator in Pig
> is drastically slower that Hive's. I was wondering whether anybody has
> experienced the same? And whether people may have any tips for improving
> the performance of this operation? (Adding a DISTINCT as suggested by an
> earlier post on here doesn't help. I am currently re-running the benchmark
> with LZO compression enabled).
>
> Regards,
> Ben
>

Re: Slow Group By operator

Reply via email to