Thanks!! Best, Ruoyu
On May 23, 2014, at 1:11 PM, Daniel Dai <[email protected]> wrote: > The first job simply read input and dump to hdfs. The need for first job is: > 1. SampleLoader does not work with non-hdfs loader > 2. SampleLoader does not process any operators before "order by" > > In some cases the first job can be optimized out, see > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer > > Thanks, > Daniel > > On Thu, May 22, 2014 at 8:04 PM, Ruoyu Liu <[email protected]> wrote: >> However, if the operation is order by multiple keys, there will be three >> jobs. The second and third job are similar to the two jobs >> when order by 1 key. Can anyone point out what will the first map-only job >> do? >> Also can anyone point me to the right place to figure out various Pig >> operation execution details? >> >> Thanks!! >> Ruoyu >> >> On May 22, 2014, at 1:32 PM, Rohini Palaniswamy <[email protected]> >> wrote: >> >>> If there is just one reducer there is no need for sampling (PIG-2784), but >>> when there is more than one reducer in order by you need to sample the data >>> and determine the partition ranges so that you can do a Distributed Orderby. >>> >>> Regards, >>> Rohini >>> >>> >>> On Thu, May 22, 2014 at 10:37 AM, Ruoyu Liu <[email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I’m looking at the execution process of several operations and have a >>>> question may be naive and hope that someone can help me. >>>> For the operations like Ordey by, why do we use an extra MR job to sample >>>> the data? But in java version implementation, we can always use on MR job >>>> to implement the operation. >>>> >>>> Thank you for your time!! >>>> >>>> Best, >>>> Ruoyu >> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
