Thanks, Dmitriy. I understand that there is only one job containing 2 map
tasks and 1 reduce tasks. But the problem is even if I only have one input
file with the size of 1.4k, (less than 50 rows of records), the stats data
still shows it needs 2 map tasks.

The union operation is shown in the top of the Map plan tree: (Union[tuple]
- scope-85)

  #--------------------------------------------------
  # Map Reduce Plan
  #--------------------------------------------------
  MapReduce node scope-84
  Map Plan
  Union[tuple] - scope-85
  |
  |---result: Local Rearrange[tuple]{bytearray}(false) - scope-73
  |   |   |
  |   |   Project[bytearray][1] - scope-74
  |   |
  |   |---part1: Filter[bag] - scope-59
  |       |   |
  |       |   Greater Than[boolean] - scope-63
  |       |   |
  |       |   |---Cast[int] - scope-61
  |       |   |   |
  |       |   |   |---Project[bytearray][1] - scope-60
  |       |   |
  |       |   |---Constant(11) - scope-62
  |       |
  |       |---my_raw: New For Each(false,false,false)[bag] - scope-89
  |           |   |
  |           |   Project[bytearray][0] - scope-86
  |           |   |
  |           |   Project[bytearray][1] - scope-87
  |           |   |
  |           |   Project[bytearray][2] - scope-88
  |           |
  |           |---my_raw:
  Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
  scope-90
  |
  |---result: Local Rearrange[tuple]{bytearray}(false) - scope-75
    |   |
    |   Project[bytearray][1] - scope-76
    |
    |---part2: Filter[bag] - scope-66
        |   |
        |   Less Than[boolean] - scope-70
        |   |
        |   |---Cast[int] - scope-68
        |   |   |
        |   |   |---Project[bytearray][1] - scope-67
        |   |
        |   |---Constant(13) - scope-69
        |
        |---my_raw: New For Each(false,false,false)[bag] - scope-94
            |   |
            |   Project[bytearray][0] - scope-91
            |   |
            |   Project[bytearray][1] - scope-92
            |   |
            |   Project[bytearray][2] - scope-93
            |
            |---my_raw:
  Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
  scope-95--------
  Reduce Plan
  result: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-77
  |
  |---result: Package[tuple]{bytearray} - scope-72--------
  Global sort: false


On Thu, Mar 8, 2012 at 1:14 AM, Dmitriy Ryaboy <[email protected]> wrote:

> You are confusing map and reduce tasks with a mapreduce jobs. Your pig
> script resulted in a single mapreduce job. The number of map tasks was 2,
> based on input size -- it has little to do with the actual operators you
> used.
>
> There is no union operator involved so I am not sure what you are
> referring to with that.
>
> On Mar 7, 2012, at 8:09 AM, Yongzhi Wang <[email protected]>
> wrote:
>
> > Hi, There
> >
> > I tried to use the syntax "explain", but the MapReduce plan sometime
> > confused me.
> >
> > I tried such syntax below:
> >
> > *my_raw = LOAD './houred-small' USING PigStorage('\t') AS (user,hour,
> > query);
> > part1 = filter my_raw by hour>11;
> > part2 = filter my_raw by hour<13;
> > result = cogroup part1 by hour, part2 by hour;
> > dump result;
> > explain result;*
> >
> > The job stats shows as blow, indicating there are 2 Map tasks and 1
> reduce
> > tasks. But I don't know how does the Map task is mapping to the MapReduce
> > plan shown below. It seems each Map task just do one filter and
> rearrange,
> > but on which phase the union operation is done? the shuffle phase? If in
> > that case, two Map tasks actually done different filter work. Is that
> > possible? Or my guess is wrong?
> >
> > So, back to the question: *Is there any way that I can see the actual map
> > and reduce task executed in the pig?*
> >
> > *Job Stats (time in seconds):
> > JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime
> > MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> > job_201203021230_0038   2       1       3       3       3       12
> > 12     1    2       my_raw,part1,part2,result       COGROUP
> > hdfs://master:54310/tmp/temp6260
> > 37557/tmp-1661404166,
> > *
> >
> > The mapreduce plan shows as below:*
> > #--------------------------------------------------
> > # Map Reduce Plan
> > #--------------------------------------------------
> > MapReduce node scope-84
> > Map Plan
> > Union[tuple] - scope-85
> > |
> > |---result: Local Rearrange[tuple]{bytearray}(false) - scope-73
> > |   |   |
> > |   |   Project[bytearray][1] - scope-74
> > |   |
> > |   |---part1: Filter[bag] - scope-59
> > |       |   |
> > |       |   Greater Than[boolean] - scope-63
> > |       |   |
> > |       |   |---Cast[int] - scope-61
> > |       |   |   |
> > |       |   |   |---Project[bytearray][1] - scope-60
> > |       |   |
> > |       |   |---Constant(11) - scope-62
> > |       |
> > |       |---my_raw: New For Each(false,false,false)[bag] - scope-89
> > |           |   |
> > |           |   Project[bytearray][0] - scope-86
> > |           |   |
> > |           |   Project[bytearray][1] - scope-87
> > |           |   |
> > |           |   Project[bytearray][2] - scope-88
> > |           |
> > |           |---my_raw:
> > Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
> > scope-90
> > |
> > |---result: Local Rearrange[tuple]{bytearray}(false) - scope-75
> >    |   |
> >    |   Project[bytearray][1] - scope-76
> >    |
> >    |---part2: Filter[bag] - scope-66
> >        |   |
> >        |   Less Than[boolean] - scope-70
> >        |   |
> >        |   |---Cast[int] - scope-68
> >        |   |   |
> >        |   |   |---Project[bytearray][1] - scope-67
> >        |   |
> >        |   |---Constant(13) - scope-69
> >        |
> >        |---my_raw: New For Each(false,false,false)[bag] - scope-94
> >            |   |
> >            |   Project[bytearray][0] - scope-91
> >            |   |
> >            |   Project[bytearray][1] - scope-92
> >            |   |
> >            |   Project[bytearray][2] - scope-93
> >            |
> >            |---my_raw:
> > Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
> > scope-95--------
> > Reduce Plan
> > result: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-77
> > |
> > |---result: Package[tuple]{bytearray} - scope-72--------
> > Global sort: false
> > ----------------*
> >
> > Thanks!
>

Reply via email to