That's what I get for reading explain plans on an iphone. Sorry. So, yeah, the cogrouping is happening as part of the shuffle. It seems like Pig's figuring a task per t1 and t2, (and then a logical union of the two, which is just to indicate that tuples from both relations go into the same meta-relation tagged with source, which will then get cogrouped). It shouldn't, it should be able to reuse the same scan of the source data for both t1 and t2.
D On Thu, Mar 8, 2012 at 9:13 AM, Yongzhi Wang <[email protected]> wrote: > Thanks, Dmitriy. I understand that there is only one job containing 2 map > tasks and 1 reduce tasks. But the problem is even if I only have one input > file with the size of 1.4k, (less than 50 rows of records), the stats data > still shows it needs 2 map tasks. > > The union operation is shown in the top of the Map plan tree: (Union[tuple] > - scope-85) > > #-------------------------------------------------- > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node scope-84 > Map Plan > Union[tuple] - scope-85 > | > |---result: Local Rearrange[tuple]{bytearray}(false) - scope-73 > | | | > | | Project[bytearray][1] - scope-74 > | | > | |---part1: Filter[bag] - scope-59 > | | | > | | Greater Than[boolean] - scope-63 > | | | > | | |---Cast[int] - scope-61 > | | | | > | | | |---Project[bytearray][1] - scope-60 > | | | > | | |---Constant(11) - scope-62 > | | > | |---my_raw: New For Each(false,false,false)[bag] - scope-89 > | | | > | | Project[bytearray][0] - scope-86 > | | | > | | Project[bytearray][1] - scope-87 > | | | > | | Project[bytearray][2] - scope-88 > | | > | |---my_raw: > Load(hdfs://master:54310/user/root/houred-small:PigStorage(' ')) - > scope-90 > | > |---result: Local Rearrange[tuple]{bytearray}(false) - scope-75 > | | > | Project[bytearray][1] - scope-76 > | > |---part2: Filter[bag] - scope-66 > | | > | Less Than[boolean] - scope-70 > | | > | |---Cast[int] - scope-68 > | | | > | | |---Project[bytearray][1] - scope-67 > | | > | |---Constant(13) - scope-69 > | > |---my_raw: New For Each(false,false,false)[bag] - scope-94 > | | > | Project[bytearray][0] - scope-91 > | | > | Project[bytearray][1] - scope-92 > | | > | Project[bytearray][2] - scope-93 > | > |---my_raw: > Load(hdfs://master:54310/user/root/houred-small:PigStorage(' ')) - > scope-95-------- > Reduce Plan > result: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-77 > | > |---result: Package[tuple]{bytearray} - scope-72-------- > Global sort: false > > > On Thu, Mar 8, 2012 at 1:14 AM, Dmitriy Ryaboy <[email protected]> wrote: > >> You are confusing map and reduce tasks with a mapreduce jobs. Your pig >> script resulted in a single mapreduce job. The number of map tasks was 2, >> based on input size -- it has little to do with the actual operators you >> used. >> >> There is no union operator involved so I am not sure what you are >> referring to with that. >> >> On Mar 7, 2012, at 8:09 AM, Yongzhi Wang <[email protected]> >> wrote: >> >> > Hi, There >> > >> > I tried to use the syntax "explain", but the MapReduce plan sometime >> > confused me. >> > >> > I tried such syntax below: >> > >> > *my_raw = LOAD './houred-small' USING PigStorage('\t') AS (user,hour, >> > query); >> > part1 = filter my_raw by hour>11; >> > part2 = filter my_raw by hour<13; >> > result = cogroup part1 by hour, part2 by hour; >> > dump result; >> > explain result;* >> > >> > The job stats shows as blow, indicating there are 2 Map tasks and 1 >> reduce >> > tasks. But I don't know how does the Map task is mapping to the MapReduce >> > plan shown below. It seems each Map task just do one filter and >> rearrange, >> > but on which phase the union operation is done? the shuffle phase? If in >> > that case, two Map tasks actually done different filter work. Is that >> > possible? Or my guess is wrong? >> > >> > So, back to the question: *Is there any way that I can see the actual map >> > and reduce task executed in the pig?* >> > >> > *Job Stats (time in seconds): >> > JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime >> > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs >> > job_201203021230_0038 2 1 3 3 3 12 >> > 12 1 2 my_raw,part1,part2,result COGROUP >> > hdfs://master:54310/tmp/temp6260 >> > 37557/tmp-1661404166, >> > * >> > >> > The mapreduce plan shows as below:* >> > #-------------------------------------------------- >> > # Map Reduce Plan >> > #-------------------------------------------------- >> > MapReduce node scope-84 >> > Map Plan >> > Union[tuple] - scope-85 >> > | >> > |---result: Local Rearrange[tuple]{bytearray}(false) - scope-73 >> > | | | >> > | | Project[bytearray][1] - scope-74 >> > | | >> > | |---part1: Filter[bag] - scope-59 >> > | | | >> > | | Greater Than[boolean] - scope-63 >> > | | | >> > | | |---Cast[int] - scope-61 >> > | | | | >> > | | | |---Project[bytearray][1] - scope-60 >> > | | | >> > | | |---Constant(11) - scope-62 >> > | | >> > | |---my_raw: New For Each(false,false,false)[bag] - scope-89 >> > | | | >> > | | Project[bytearray][0] - scope-86 >> > | | | >> > | | Project[bytearray][1] - scope-87 >> > | | | >> > | | Project[bytearray][2] - scope-88 >> > | | >> > | |---my_raw: >> > Load(hdfs://master:54310/user/root/houred-small:PigStorage(' ')) - >> > scope-90 >> > | >> > |---result: Local Rearrange[tuple]{bytearray}(false) - scope-75 >> > | | >> > | Project[bytearray][1] - scope-76 >> > | >> > |---part2: Filter[bag] - scope-66 >> > | | >> > | Less Than[boolean] - scope-70 >> > | | >> > | |---Cast[int] - scope-68 >> > | | | >> > | | |---Project[bytearray][1] - scope-67 >> > | | >> > | |---Constant(13) - scope-69 >> > | >> > |---my_raw: New For Each(false,false,false)[bag] - scope-94 >> > | | >> > | Project[bytearray][0] - scope-91 >> > | | >> > | Project[bytearray][1] - scope-92 >> > | | >> > | Project[bytearray][2] - scope-93 >> > | >> > |---my_raw: >> > Load(hdfs://master:54310/user/root/houred-small:PigStorage(' ')) - >> > scope-95-------- >> > Reduce Plan >> > result: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-77 >> > | >> > |---result: Package[tuple]{bytearray} - scope-72-------- >> > Global sort: false >> > ----------------* >> > >> > Thanks! >>
