Oh Man!!! Thanks, your tip comes in handy, =). I had tried to set some properties but It hadnt worked out but now when I use PARALLEL clause it works very well. thanks dude. Have nice day.
On Thu, Aug 5, 2010 at 10:38 AM, Gibbon, Robert, VF-Group < [email protected]> wrote: > > Use the PARALLEL clause of course! > > PARALLEL n > > Increase the parallelism of a job by specifying the number of reduce > tasks, n. The default value for n is 1 (one reduce task). Note the > following: > > * Parallel only affects the number of reduce tasks. Map parallelism > is determined by the input file, one map for each HDFS block. > * If you don't specify parallel, you still get the same map > parallelism but only one reduce task. > > For more information, see the Pig Cookbook. > > > -----Original Message----- > From: Marcos Pinto [mailto:[email protected]] > Sent: Donnerstag, 5. August 2010 14:56 > To: [email protected] > Subject: Problem: when I run a pig's script I got one reduce task > > Hi guys, how u doing? > > I am learning how to use hadoop and I got this problem: > I set up a cluster with 5 nodes( 4 datanode n 1 namenode) and I used the > same configuration for jobtracker n tasktracker. > when I run a pig's script I get many map's( like 15) but just 1 > reduce!!!!! > this kills all the parallel processing. For example. > I have a file that has 1 GB and when I run the pig's script in a cluster > It takes about 50 minutes to process. =( > > So I really appreciate if someone could help with any tip. Thanks for > your time. >
