pig-user  

enforcing number of mappers

prasenjit mukherjee
Sun, 24 Jan 2010 02:46:07 -0800

I want to use Pig to paralelize processing on a number of  requests. There
are ~ 300 request which needs to be  processed. Each processing consist of
following :
1. Fetch file from s3 to local
2. Do some preprocessing
3. Put it into hdfs

My input is a small file with 300 lines. The problem is that pig seems to be
always creating a single mapper, because of which the load is not properly
distributed. Any way I can enforce splitting of smaller input files as well
? Below is the pig output which tends to indicate that there is only 1
mapper. Let me know if my understanding is wrong.

2010-01-24 05:31:53,148 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2010-01-24 05:31:53,148 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2010-01-24 05:31:55,006 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job

Thanks
-Prasen.