pig-user  

Re: how many map tasks?

hyoung jun kim
Wed, 07 May 2008 16:56:45 -0700

Thanks. but I cheecked hadoop configuraion.
"dfs.block.size" value is 67108864. and I also checked input file's number
of blocks.
Input file has 6 blocks.


2008/5/8 Vitthal Gogate <[EMAIL PROTECTED]>:

> Sorry I mean check the "dfs.block.size" parameter in hadoop-site.xml file
> in
> the $HADOOP_HOME/conf  directory, it may be configured as 128MB.
>
> Sorry,in the following reply, I kind of assumed default size as 128MB :)
>
> regards
>
>
> On 5/7/08 9:42 AM, "Vitthal Gogate" <[EMAIL PROTECTED]> wrote:
>
> > I assume block size is 128MB. I guess it is specified in hadoop site
> > configuration file.  Also for join the the mapred program creates number
> of
> > map tasks based on combined size of both tables/files being joined..
> >
> > -regards, Suhas
> >
> >
> > On 5/7/08 12:57 AM, "hyoung jun kim" <[EMAIL PROTECTED]> wrote:
> >
> >> Hi all,
> >> I read "pig_hadoopsummit.pdf" and tried it.
> >> I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2.
> >> And ran this script.
> >>
> >> Visits= load '/dir1/visit as (user, url, time);
> >> Visits= foreach Visits generate user, url, time;
> >> Pages= load '/dir2/page' as (url, pagerank);
> >> VP= join Visits by url, Pages by url;
> >> Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr;
> >> store Results into '/data/users';
> >>
> >> I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks.
> >> But Hadoop makes 2 map tasks and 1 reduce task.
> >> Why hadoop made only 2 map tasks?
> >>
> >> Test environment:
> >>  - 5 hadoop cluster
> >>  - hadoop 0.16.3
> >>  - pig upated from svn repository on May.7
> >
>
>