I found the option. As long as that can disable all the optimization, it
would be helpful. Thanks very much!

    -t, -optimizer_off - Turn optimizations off. The following values are
supported:
            SplitFilter - Split filter conditions
            PushUpFilter - Filter as early as possible
            MergeFilter - Merge filter conditions
            PushDownForeachFlatten - Join or explode as late as possible
            LimitOptimizer - Limit as early as possible
            ColumnMapKeyPrune - Remove unused data
            AddForEach - Add ForEach to remove unneeded columns
            MergeForEach - Merge adjacent ForEach
            GroupByConstParallelSetter - Force parallel 1 for "group all"
statement
            All - Disable all optimizations
        All optimizations listed here are enabled by default. Optimization
values are case insensitive.


Best regards,
Yongzhi

On Mon, Mar 12, 2012 at 6:52 PM, Dmitriy Ryaboy <[email protected]> wrote:

> There is a switch to turn off all optimizations. I don't recall what it is
> off the top of my head, but you can easily find out by running pig -h. Does
> that help?
>
> On Mar 12, 2012, at 2:58 PM, Yongzhi Wang <[email protected]>
> wrote:
>
> > I found in the Physical plan, there are 3 split operator generated. I
> think
> > that's the reason of 3 map tasks.
> >
> > Is that possible that in the future pig can provide a parameter or syntax
> > to determine whether optimization will be launched? Sometimes the one to
> > one translating from pig script to the MapReduce plan is useful, since
> the
> > user can control the execution MapReduce plan by manipulating the script.
> >
> > Thanks!
> >
> > On Fri, Mar 9, 2012 at 2:02 PM, Dmitriy Ryaboy <[email protected]>
> wrote:
> >
> >> Pig always runs the same piece of code, generically speaking. There is
> >> no codegen. What actually happens is driven by serialized DAGs of the
> >> physical operators.
> >> This does seem like a bug (or an inefficiency, at least), we shouldn't
> >> need 3 mappers to do 3 filters of the same data and re-group.
> >>
> >> D
> >>
> >> On Thu, Mar 8, 2012 at 7:51 PM, Yongzhi Wang <
> [email protected]>
> >> wrote:
> >>> So two map tasks should running same piece of code, but reading
> different
> >>> input?
> >>> Or two tasks actually running different code?
> >>> Is there any way that I can track the real map reduce functions that
> the
> >>> pig parsed to the worker?
> >>> Or can you tell me which piece of source code in the pig project
> generate
> >>> the map and reduce tasks parsed to the slave worker?
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> On Thu, Mar 8, 2012 at 8:23 PM, Dmitriy Ryaboy <[email protected]>
> >> wrote:
> >>>
> >>>> That's what I get for reading explain plans on an iphone. Sorry.
> >>>>
> >>>> So, yeah, the cogrouping is happening as part of the shuffle.
> >>>> It seems like Pig's figuring a task per t1 and t2, (and then a logical
> >>>> union of the two, which is just to indicate that tuples from both
> >>>> relations go into the same meta-relation tagged with source, which
> >>>> will then get cogrouped). It shouldn't, it should be able to reuse the
> >>>> same scan of the source data for both t1 and t2.
> >>>>
> >>>> D
> >>>>
> >>>> On Thu, Mar 8, 2012 at 9:13 AM, Yongzhi Wang <
> >> [email protected]>
> >>>> wrote:
> >>>>> Thanks, Dmitriy. I understand that there is only one job containing 2
> >> map
> >>>>> tasks and 1 reduce tasks. But the problem is even if I only have one
> >>>> input
> >>>>> file with the size of 1.4k, (less than 50 rows of records), the stats
> >>>> data
> >>>>> still shows it needs 2 map tasks.
> >>>>>
> >>>>> The union operation is shown in the top of the Map plan tree:
> >>>> (Union[tuple]
> >>>>> - scope-85)
> >>>>>
> >>>>> #--------------------------------------------------
> >>>>> # Map Reduce Plan
> >>>>> #--------------------------------------------------
> >>>>> MapReduce node scope-84
> >>>>> Map Plan
> >>>>> Union[tuple] - scope-85
> >>>>> |
> >>>>> |---result: Local Rearrange[tuple]{bytearray}(false) - scope-73
> >>>>> |   |   |
> >>>>> |   |   Project[bytearray][1] - scope-74
> >>>>> |   |
> >>>>> |   |---part1: Filter[bag] - scope-59
> >>>>> |       |   |
> >>>>> |       |   Greater Than[boolean] - scope-63
> >>>>> |       |   |
> >>>>> |       |   |---Cast[int] - scope-61
> >>>>> |       |   |   |
> >>>>> |       |   |   |---Project[bytearray][1] - scope-60
> >>>>> |       |   |
> >>>>> |       |   |---Constant(11) - scope-62
> >>>>> |       |
> >>>>> |       |---my_raw: New For Each(false,false,false)[bag] - scope-89
> >>>>> |           |   |
> >>>>> |           |   Project[bytearray][0] - scope-86
> >>>>> |           |   |
> >>>>> |           |   Project[bytearray][1] - scope-87
> >>>>> |           |   |
> >>>>> |           |   Project[bytearray][2] - scope-88
> >>>>> |           |
> >>>>> |           |---my_raw:
> >>>>> Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
> >>>>> scope-90
> >>>>> |
> >>>>> |---result: Local Rearrange[tuple]{bytearray}(false) - scope-75
> >>>>>   |   |
> >>>>>   |   Project[bytearray][1] - scope-76
> >>>>>   |
> >>>>>   |---part2: Filter[bag] - scope-66
> >>>>>       |   |
> >>>>>       |   Less Than[boolean] - scope-70
> >>>>>       |   |
> >>>>>       |   |---Cast[int] - scope-68
> >>>>>       |   |   |
> >>>>>       |   |   |---Project[bytearray][1] - scope-67
> >>>>>       |   |
> >>>>>       |   |---Constant(13) - scope-69
> >>>>>       |
> >>>>>       |---my_raw: New For Each(false,false,false)[bag] - scope-94
> >>>>>           |   |
> >>>>>           |   Project[bytearray][0] - scope-91
> >>>>>           |   |
> >>>>>           |   Project[bytearray][1] - scope-92
> >>>>>           |   |
> >>>>>           |   Project[bytearray][2] - scope-93
> >>>>>           |
> >>>>>           |---my_raw:
> >>>>> Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
> >>>>> scope-95--------
> >>>>> Reduce Plan
> >>>>> result: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-77
> >>>>> |
> >>>>> |---result: Package[tuple]{bytearray} - scope-72--------
> >>>>> Global sort: false
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 8, 2012 at 1:14 AM, Dmitriy Ryaboy <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> You are confusing map and reduce tasks with a mapreduce jobs. Your
> >> pig
> >>>>>> script resulted in a single mapreduce job. The number of map tasks
> >> was
> >>>> 2,
> >>>>>> based on input size -- it has little to do with the actual operators
> >> you
> >>>>>> used.
> >>>>>>
> >>>>>> There is no union operator involved so I am not sure what you are
> >>>>>> referring to with that.
> >>>>>>
> >>>>>> On Mar 7, 2012, at 8:09 AM, Yongzhi Wang <
> [email protected]
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi, There
> >>>>>>>
> >>>>>>> I tried to use the syntax "explain", but the MapReduce plan
> >> sometime
> >>>>>>> confused me.
> >>>>>>>
> >>>>>>> I tried such syntax below:
> >>>>>>>
> >>>>>>> *my_raw = LOAD './houred-small' USING PigStorage('\t') AS
> >> (user,hour,
> >>>>>>> query);
> >>>>>>> part1 = filter my_raw by hour>11;
> >>>>>>> part2 = filter my_raw by hour<13;
> >>>>>>> result = cogroup part1 by hour, part2 by hour;
> >>>>>>> dump result;
> >>>>>>> explain result;*
> >>>>>>>
> >>>>>>> The job stats shows as blow, indicating there are 2 Map tasks and 1
> >>>>>> reduce
> >>>>>>> tasks. But I don't know how does the Map task is mapping to the
> >>>> MapReduce
> >>>>>>> plan shown below. It seems each Map task just do one filter and
> >>>>>> rearrange,
> >>>>>>> but on which phase the union operation is done? the shuffle phase?
> >> If
> >>>> in
> >>>>>>> that case, two Map tasks actually done different filter work. Is
> >> that
> >>>>>>> possible? Or my guess is wrong?
> >>>>>>>
> >>>>>>> So, back to the question: *Is there any way that I can see the
> >> actual
> >>>> map
> >>>>>>> and reduce task executed in the pig?*
> >>>>>>>
> >>>>>>> *Job Stats (time in seconds):
> >>>>>>> JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime
> >>>>>>> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature
> >>>> Outputs
> >>>>>>> job_201203021230_0038   2       1       3       3       3       12
> >>>>>>> 12     1    2       my_raw,part1,part2,result       COGROUP
> >>>>>>> hdfs://master:54310/tmp/temp6260
> >>>>>>> 37557/tmp-1661404166,
> >>>>>>> *
> >>>>>>>
> >>>>>>> The mapreduce plan shows as below:*
> >>>>>>> #--------------------------------------------------
> >>>>>>> # Map Reduce Plan
> >>>>>>> #--------------------------------------------------
> >>>>>>> MapReduce node scope-84
> >>>>>>> Map Plan
> >>>>>>> Union[tuple] - scope-85
> >>>>>>> |
> >>>>>>> |---result: Local Rearrange[tuple]{bytearray}(false) - scope-73
> >>>>>>> |   |   |
> >>>>>>> |   |   Project[bytearray][1] - scope-74
> >>>>>>> |   |
> >>>>>>> |   |---part1: Filter[bag] - scope-59
> >>>>>>> |       |   |
> >>>>>>> |       |   Greater Than[boolean] - scope-63
> >>>>>>> |       |   |
> >>>>>>> |       |   |---Cast[int] - scope-61
> >>>>>>> |       |   |   |
> >>>>>>> |       |   |   |---Project[bytearray][1] - scope-60
> >>>>>>> |       |   |
> >>>>>>> |       |   |---Constant(11) - scope-62
> >>>>>>> |       |
> >>>>>>> |       |---my_raw: New For Each(false,false,false)[bag] - scope-89
> >>>>>>> |           |   |
> >>>>>>> |           |   Project[bytearray][0] - scope-86
> >>>>>>> |           |   |
> >>>>>>> |           |   Project[bytearray][1] - scope-87
> >>>>>>> |           |   |
> >>>>>>> |           |   Project[bytearray][2] - scope-88
> >>>>>>> |           |
> >>>>>>> |           |---my_raw:
> >>>>>>> Load(hdfs://master:54310/user/root/houred-small:PigStorage('
> >> ')) -
> >>>>>>> scope-90
> >>>>>>> |
> >>>>>>> |---result: Local Rearrange[tuple]{bytearray}(false) - scope-75
> >>>>>>>   |   |
> >>>>>>>   |   Project[bytearray][1] - scope-76
> >>>>>>>   |
> >>>>>>>   |---part2: Filter[bag] - scope-66
> >>>>>>>       |   |
> >>>>>>>       |   Less Than[boolean] - scope-70
> >>>>>>>       |   |
> >>>>>>>       |   |---Cast[int] - scope-68
> >>>>>>>       |   |   |
> >>>>>>>       |   |   |---Project[bytearray][1] - scope-67
> >>>>>>>       |   |
> >>>>>>>       |   |---Constant(13) - scope-69
> >>>>>>>       |
> >>>>>>>       |---my_raw: New For Each(false,false,false)[bag] - scope-94
> >>>>>>>           |   |
> >>>>>>>           |   Project[bytearray][0] - scope-91
> >>>>>>>           |   |
> >>>>>>>           |   Project[bytearray][1] - scope-92
> >>>>>>>           |   |
> >>>>>>>           |   Project[bytearray][2] - scope-93
> >>>>>>>           |
> >>>>>>>           |---my_raw:
> >>>>>>> Load(hdfs://master:54310/user/root/houred-small:PigStorage('
> >> ')) -
> >>>>>>> scope-95--------
> >>>>>>> Reduce Plan
> >>>>>>> result: Store(fakefile:org.apache.pig.builtin.PigStorage) -
> >> scope-77
> >>>>>>> |
> >>>>>>> |---result: Package[tuple]{bytearray} - scope-72--------
> >>>>>>> Global sort: false
> >>>>>>> ----------------*
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>
> >>>>
> >>
>

Reply via email to