Re: Mapper basic question

Bejoy KS Wed, 11 Jul 2012 06:37:29 -0700

Hi Manoj

Block size is in hdfs storage level where as split size is the amount of data 
processed by each mapper while running a map reduce job(One split is the data 
processed by one mapper). One or more hdfs blocks can contribute a split. 
Splits are determined by the InputFormat as well as the min and max split size 
properties.

As Arun mentioned use CombineFileInputFormat and adjust the min and max split 
size properties to control/limit the number of mappers. 

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Manoj Babu <manoj...@gmail.com>
Date: Wed, 11 Jul 2012 18:17:41 
To: <mapreduce-user@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re: Mapper basic question

Hi  Tariq \Arun,

The no of blocks(splits) = *total no of file size/hdfs block size *
replicate value*
The no of splits is again nothing but the blocks here.

Other than increasing the block size(input splits) is it possible to limit
that no of mappers?

Cheers!
Manoj.

On Wed, Jul 11, 2012 at 6:06 PM, Arun C Murthy <a...@hortonworks.com> wrote:

> Take a look at CombineFileInputFormat - this will create 'meta splits'
> which include multiple small spilts, thus reducing #maps which are run.
>
> Arun
>
> On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote:
>
> Hi,
>
> The no of mappers is depends on the no of blocks. Is it possible to limit
> the no of mappers size without increasing the HDFS block size?
>
> Thanks in advance.
>
> Cheers!
> Manoj.
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Mapper basic question

Reply via email to