hyoung jun kim
Wed, 07 May 2008 00:58:04 -0700
Hi all, I read "pig_hadoopsummit.pdf" and tried it. I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2. And ran this script. Visits= load '/dir1/visit as (user, url, time); Visits= foreach Visits generate user, url, time; Pages= load '/dir2/page' as (url, pagerank); VP= join Visits by url, Pages by url; Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr; store Results into '/data/users'; I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks. But Hadoop makes 2 map tasks and 1 reduce task. Why hadoop made only 2 map tasks? Test environment: - 5 hadoop cluster - hadoop 0.16.3 - pig upated from svn repository on May.7