pig-user  

Re: how many map tasks?

Vitthal Gogate
Wed, 07 May 2008 09:44:13 -0700

I assume block size is 128MB. I guess it is specified in hadoop site
configuration file.  Also for join the the mapred program creates number of
map tasks based on combined size of both tables/files being joined..

-regards, Suhas


On 5/7/08 12:57 AM, "hyoung jun kim" <[EMAIL PROTECTED]> wrote:

> Hi all,
> I read "pig_hadoopsummit.pdf" and tried it.
> I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2.
> And ran this script.
> 
> Visits= load '/dir1/visit as (user, url, time);
> Visits= foreach Visits generate user, url, time;
> Pages= load '/dir2/page' as (url, pagerank);
> VP= join Visits by url, Pages by url;
> Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr;
> store Results into '/data/users';
> 
> I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks.
> But Hadoop makes 2 map tasks and 1 reduce task.
> Why hadoop made only 2 map tasks?
> 
> Test environment:
>  - 5 hadoop cluster
>  - hadoop 0.16.3
>  - pig upated from svn repository on May.7