However, if the operation is order by multiple keys, there will be three jobs. The second and third job are similar to the two jobs when order by 1 key. Can anyone point out what will the first map-only job do? Also can anyone point me to the right place to figure out various Pig operation execution details?
Thanks!! Ruoyu On May 22, 2014, at 1:32 PM, Rohini Palaniswamy <[email protected]> wrote: > If there is just one reducer there is no need for sampling (PIG-2784), but > when there is more than one reducer in order by you need to sample the data > and determine the partition ranges so that you can do a Distributed Orderby. > > Regards, > Rohini > > > On Thu, May 22, 2014 at 10:37 AM, Ruoyu Liu <[email protected]> wrote: > >> Hi all, >> >> I’m looking at the execution process of several operations and have a >> question may be naive and hope that someone can help me. >> For the operations like Ordey by, why do we use an extra MR job to sample >> the data? But in java version implementation, we can always use on MR job >> to implement the operation. >> >> Thank you for your time!! >> >> Best, >> Ruoyu
