Hi guys, how u doing? I am learning how to use hadoop and I got this problem: I set up a cluster with 5 nodes( 4 datanode n 1 namenode) and I used the same configuration for jobtracker n tasktracker. when I run a pig's script I get many map's( like 15) but just 1 reduce!!!!! this kills all the parallel processing. For example. I have a file that has 1 GB and when I run the pig's script in a cluster It takes about 50 minutes to process. =(
So I really appreciate if someone could help with any tip. Thanks for your time.
