Re: About Issue PIG-841

Daniel Dai Mon, 07 Mar 2011 10:51:55 -0800

Sampling job in Pig is used in "order by" and "skewed join". It will betranslated to a single map-reduce job. In the map, we sample the datawith a configurable interval; in the reduce, we do a "group all"followed by a nested foreach. Within foreach, we do a nested sort andthen feed the result to UDF ("order by" and "skewed join" use different UDF)

In PIG-1038, we will optimize nested sort using hadoop secondary sort ifpossible. Sampling job fits in the bill. So PIG-841 is fixed automatically.


Daniel

On 03/05/2011 12:54 PM, Renato Marroquín Mogrovejo wrote:

Hey does anybody know if PIG-841 was developed? And if it was, how is it
being used by Pig?
Thanks in advance.

Renato M.

Re: About Issue PIG-841

Reply via email to