pig-user  

how many map tasks?

hyoung jun kim
Wed, 07 May 2008 00:58:04 -0700

Hi all,
I read "pig_hadoopsummit.pdf" and tried it.
I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2.
And ran this script.

Visits= load '/dir1/visit as (user, url, time);
Visits= foreach Visits generate user, url, time;
Pages= load '/dir2/page' as (url, pagerank);
VP= join Visits by url, Pages by url;
Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr;
store Results into '/data/users';

I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks.
But Hadoop makes 2 map tasks and 1 reduce task.
Why hadoop made only 2 map tasks?

Test environment:
 - 5 hadoop cluster
 - hadoop 0.16.3
 - pig upated from svn repository on May.7