pig-user  

Re: enforcing number of mappers

Dmitriy Ryaboy
Sun, 24 Jan 2010 04:12:24 -0800

you need to write a custom slicer that will enforce your preferred
strategy for determining # of mappers.

Once the load/store redesign goes in, slicers will go away, and you
will write custom hadoop partitioners instead.
-D

On Sun, Jan 24, 2010 at 2:45 AM, prasenjit mukherjee
<prasen....@gmail.com> wrote:
> I want to use Pig to paralelize processing on a number of  requests. There
> are ~ 300 request which needs to be  processed. Each processing consist of
> following :
> 1. Fetch file from s3 to local
> 2. Do some preprocessing
> 3. Put it into hdfs
>
> My input is a small file with 300 lines. The problem is that pig seems to be
> always creating a single mapper, because of which the load is not properly
> distributed. Any way I can enforce splitting of smaller input files as well
> ? Below is the pig output which tends to indicate that there is only 1
> mapper. Let me know if my understanding is wrong.
>
> 2010-01-24 05:31:53,148 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2010-01-24 05:31:53,148 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2010-01-24 05:31:55,006 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
>
> Thanks
> -Prasen.
>