That's what I get for reading explain plans on an iphone. Sorry.

So, yeah, the cogrouping is happening as part of the shuffle.
It seems like Pig's figuring a task per t1 and t2, (and then a logical
union of the two, which is just to indicate that tuples from both
relations go into the same meta-relation tagged with source, which
will then get cogrouped). It shouldn't, it should be able to reuse the
same scan of the source data for both t1 and t2.

D

On Thu, Mar 8, 2012 at 9:13 AM, Yongzhi Wang <[email protected]> wrote:
> Thanks, Dmitriy. I understand that there is only one job containing 2 map
> tasks and 1 reduce tasks. But the problem is even if I only have one input
> file with the size of 1.4k, (less than 50 rows of records), the stats data
> still shows it needs 2 map tasks.
>
> The union operation is shown in the top of the Map plan tree: (Union[tuple]
> - scope-85)
>
>  #--------------------------------------------------
>  # Map Reduce Plan
>  #--------------------------------------------------
>  MapReduce node scope-84
>  Map Plan
>  Union[tuple] - scope-85
>  |
>  |---result: Local Rearrange[tuple]{bytearray}(false) - scope-73
>  |   |   |
>  |   |   Project[bytearray][1] - scope-74
>  |   |
>  |   |---part1: Filter[bag] - scope-59
>  |       |   |
>  |       |   Greater Than[boolean] - scope-63
>  |       |   |
>  |       |   |---Cast[int] - scope-61
>  |       |   |   |
>  |       |   |   |---Project[bytearray][1] - scope-60
>  |       |   |
>  |       |   |---Constant(11) - scope-62
>  |       |
>  |       |---my_raw: New For Each(false,false,false)[bag] - scope-89
>  |           |   |
>  |           |   Project[bytearray][0] - scope-86
>  |           |   |
>  |           |   Project[bytearray][1] - scope-87
>  |           |   |
>  |           |   Project[bytearray][2] - scope-88
>  |           |
>  |           |---my_raw:
>  Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
>  scope-90
>  |
>  |---result: Local Rearrange[tuple]{bytearray}(false) - scope-75
>    |   |
>    |   Project[bytearray][1] - scope-76
>    |
>    |---part2: Filter[bag] - scope-66
>        |   |
>        |   Less Than[boolean] - scope-70
>        |   |
>        |   |---Cast[int] - scope-68
>        |   |   |
>        |   |   |---Project[bytearray][1] - scope-67
>        |   |
>        |   |---Constant(13) - scope-69
>        |
>        |---my_raw: New For Each(false,false,false)[bag] - scope-94
>            |   |
>            |   Project[bytearray][0] - scope-91
>            |   |
>            |   Project[bytearray][1] - scope-92
>            |   |
>            |   Project[bytearray][2] - scope-93
>            |
>            |---my_raw:
>  Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
>  scope-95--------
>  Reduce Plan
>  result: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-77
>  |
>  |---result: Package[tuple]{bytearray} - scope-72--------
>  Global sort: false
>
>
> On Thu, Mar 8, 2012 at 1:14 AM, Dmitriy Ryaboy <[email protected]> wrote:
>
>> You are confusing map and reduce tasks with a mapreduce jobs. Your pig
>> script resulted in a single mapreduce job. The number of map tasks was 2,
>> based on input size -- it has little to do with the actual operators you
>> used.
>>
>> There is no union operator involved so I am not sure what you are
>> referring to with that.
>>
>> On Mar 7, 2012, at 8:09 AM, Yongzhi Wang <[email protected]>
>> wrote:
>>
>> > Hi, There
>> >
>> > I tried to use the syntax "explain", but the MapReduce plan sometime
>> > confused me.
>> >
>> > I tried such syntax below:
>> >
>> > *my_raw = LOAD './houred-small' USING PigStorage('\t') AS (user,hour,
>> > query);
>> > part1 = filter my_raw by hour>11;
>> > part2 = filter my_raw by hour<13;
>> > result = cogroup part1 by hour, part2 by hour;
>> > dump result;
>> > explain result;*
>> >
>> > The job stats shows as blow, indicating there are 2 Map tasks and 1
>> reduce
>> > tasks. But I don't know how does the Map task is mapping to the MapReduce
>> > plan shown below. It seems each Map task just do one filter and
>> rearrange,
>> > but on which phase the union operation is done? the shuffle phase? If in
>> > that case, two Map tasks actually done different filter work. Is that
>> > possible? Or my guess is wrong?
>> >
>> > So, back to the question: *Is there any way that I can see the actual map
>> > and reduce task executed in the pig?*
>> >
>> > *Job Stats (time in seconds):
>> > JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime
>> > MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
>> > job_201203021230_0038   2       1       3       3       3       12
>> > 12     1    2       my_raw,part1,part2,result       COGROUP
>> > hdfs://master:54310/tmp/temp6260
>> > 37557/tmp-1661404166,
>> > *
>> >
>> > The mapreduce plan shows as below:*
>> > #--------------------------------------------------
>> > # Map Reduce Plan
>> > #--------------------------------------------------
>> > MapReduce node scope-84
>> > Map Plan
>> > Union[tuple] - scope-85
>> > |
>> > |---result: Local Rearrange[tuple]{bytearray}(false) - scope-73
>> > |   |   |
>> > |   |   Project[bytearray][1] - scope-74
>> > |   |
>> > |   |---part1: Filter[bag] - scope-59
>> > |       |   |
>> > |       |   Greater Than[boolean] - scope-63
>> > |       |   |
>> > |       |   |---Cast[int] - scope-61
>> > |       |   |   |
>> > |       |   |   |---Project[bytearray][1] - scope-60
>> > |       |   |
>> > |       |   |---Constant(11) - scope-62
>> > |       |
>> > |       |---my_raw: New For Each(false,false,false)[bag] - scope-89
>> > |           |   |
>> > |           |   Project[bytearray][0] - scope-86
>> > |           |   |
>> > |           |   Project[bytearray][1] - scope-87
>> > |           |   |
>> > |           |   Project[bytearray][2] - scope-88
>> > |           |
>> > |           |---my_raw:
>> > Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
>> > scope-90
>> > |
>> > |---result: Local Rearrange[tuple]{bytearray}(false) - scope-75
>> >    |   |
>> >    |   Project[bytearray][1] - scope-76
>> >    |
>> >    |---part2: Filter[bag] - scope-66
>> >        |   |
>> >        |   Less Than[boolean] - scope-70
>> >        |   |
>> >        |   |---Cast[int] - scope-68
>> >        |   |   |
>> >        |   |   |---Project[bytearray][1] - scope-67
>> >        |   |
>> >        |   |---Constant(13) - scope-69
>> >        |
>> >        |---my_raw: New For Each(false,false,false)[bag] - scope-94
>> >            |   |
>> >            |   Project[bytearray][0] - scope-91
>> >            |   |
>> >            |   Project[bytearray][1] - scope-92
>> >            |   |
>> >            |   Project[bytearray][2] - scope-93
>> >            |
>> >            |---my_raw:
>> > Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
>> > scope-95--------
>> > Reduce Plan
>> > result: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-77
>> > |
>> > |---result: Package[tuple]{bytearray} - scope-72--------
>> > Global sort: false
>> > ----------------*
>> >
>> > Thanks!
>>

Reply via email to