Re: Sampling in operations like Order by

Ruoyu Liu Fri, 23 May 2014 09:13:32 -0700

However, if the operation is order by multiple keys, there will be three jobs. 
The second and third job are similar to the two jobs 
when order by 1 key. Can anyone point out what will the first map-only job do?
Also can anyone point me to the right place to figure out various Pig operation 
execution details?


Thanks!!
Ruoyu

On May 22, 2014, at 1:32 PM, Rohini Palaniswamy <[email protected]> wrote:

> If there is just one reducer there is no need for sampling (PIG-2784), but
> when there is more than one reducer in order by you need to sample the data
> and determine the partition ranges so that you can do a Distributed Orderby.
> 
> Regards,
> Rohini
> 
> 
> On Thu, May 22, 2014 at 10:37 AM, Ruoyu Liu <[email protected]> wrote:
> 
>> Hi all,
>> 
>> I’m looking at the execution process of several operations and have a
>> question may be naive and hope that someone can help me.
>> For the operations like Ordey by, why do we use an extra MR job to sample
>> the data? But in java version implementation, we can always use on MR job
>> to implement the operation.
>> 
>> Thank you for your time!!
>> 
>> Best,
>> Ruoyu

Re: Sampling in operations like Order by

Reply via email to