hyoung jun kim
Wed, 07 May 2008 16:56:45 -0700
Thanks. but I cheecked hadoop configuraion. "dfs.block.size" value is 67108864. and I also checked input file's number of blocks. Input file has 6 blocks. 2008/5/8 Vitthal Gogate <[EMAIL PROTECTED]>: > Sorry I mean check the "dfs.block.size" parameter in hadoop-site.xml file > in > the $HADOOP_HOME/conf directory, it may be configured as 128MB. > > Sorry,in the following reply, I kind of assumed default size as 128MB :) > > regards > > > On 5/7/08 9:42 AM, "Vitthal Gogate" <[EMAIL PROTECTED]> wrote: > > > I assume block size is 128MB. I guess it is specified in hadoop site > > configuration file. Also for join the the mapred program creates number > of > > map tasks based on combined size of both tables/files being joined.. > > > > -regards, Suhas > > > > > > On 5/7/08 12:57 AM, "hyoung jun kim" <[EMAIL PROTECTED]> wrote: > > > >> Hi all, > >> I read "pig_hadoopsummit.pdf" and tried it. > >> I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2. > >> And ran this script. > >> > >> Visits= load '/dir1/visit as (user, url, time); > >> Visits= foreach Visits generate user, url, time; > >> Pages= load '/dir2/page' as (url, pagerank); > >> VP= join Visits by url, Pages by url; > >> Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr; > >> store Results into '/data/users'; > >> > >> I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks. > >> But Hadoop makes 2 map tasks and 1 reduce task. > >> Why hadoop made only 2 map tasks? > >> > >> Test environment: > >> - 5 hadoop cluster > >> - hadoop 0.16.3 > >> - pig upated from svn repository on May.7 > > > >