Vitthal Gogate
Wed, 07 May 2008 09:44:13 -0700
I assume block size is 128MB. I guess it is specified in hadoop site configuration file. Also for join the the mapred program creates number of map tasks based on combined size of both tables/files being joined.. -regards, Suhas On 5/7/08 12:57 AM, "hyoung jun kim" <[EMAIL PROTECTED]> wrote: > Hi all, > I read "pig_hadoopsummit.pdf" and tried it. > I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2. > And ran this script. > > Visits= load '/dir1/visit as (user, url, time); > Visits= foreach Visits generate user, url, time; > Pages= load '/dir2/page' as (url, pagerank); > VP= join Visits by url, Pages by url; > Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr; > store Results into '/data/users'; > > I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks. > But Hadoop makes 2 map tasks and 1 reduce task. > Why hadoop made only 2 map tasks? > > Test environment: > - 5 hadoop cluster > - hadoop 0.16.3 > - pig upated from svn repository on May.7