Shai,

You will have to implement MultiFileInputFormat
<http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/MultiFileInputFormat.html>
and
set that has your input format.
You may find
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/examples/MultiFileWordCount.html
 useful.

On Thu, Nov 25, 2010 at 12:01 PM, Shai Erera <ser...@gmail.com> wrote:

> I wasn't talking about how to configure the cluster to not invoke more than
> a certain # of Mappers simultaneously. Instead, I'd like to configure a
> (certain) job to invoke exactly N Mappers, where N is the number of cores in
> the cluster. Irregardless of the size of the data. This is not critical if
> it can't be done, but it can improve the performance of my job if it can be
> done.
>
> Thanks
> Shai
>
>
> On Thu, Nov 25, 2010 at 9:55 PM, Niels Basjes <ni...@basjes.nl> wrote:
>
>> Hi,
>>
>> 2010/11/25 Shai Erera <ser...@gmail.com>:
>> > Is there a way to make MapReduce create exactly N Mappers? More
>> > specifically, if say my data can be split to 200 Mappers, and I have
>> only
>> > 100 cores, how can I ensure only 100 Mappers will be created? The number
>> of
>> > cores is not something I know in advance, so writing a special
>> InputFormat
>> > might be tricky, unless I can query Hadoop for the available # of cores
>> (in
>> > the entire cluster).
>>
>> You can configure on a node by node basis how many map and reduce
>> tasks can be started by the task tracker on that node.
>> This is done via the conf/mapred-site.xml using these two settings:
>> mapred.tasktracker.{map|reduce}.tasks.maximum
>>
>> Have a look at this page for more information
>> http://hadoop.apache.org/common/docs/current/cluster_setup.html
>>
>> --
>> Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>

Reply via email to