Thanks, I'll take a look

On Thu, Nov 25, 2010 at 10:20 PM, Shrijeet Paliwal
<shrij...@rocketfuel.com>wrote:

> Shai,
>
> You will have to implement MultiFileInputFormat 
> <http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/MultiFileInputFormat.html>
>  and
> set that has your input format.
> You may find
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/examples/MultiFileWordCount.html
>  useful.
>
>
> On Thu, Nov 25, 2010 at 12:01 PM, Shai Erera <ser...@gmail.com> wrote:
>
>> I wasn't talking about how to configure the cluster to not invoke more
>> than a certain # of Mappers simultaneously. Instead, I'd like to configure a
>> (certain) job to invoke exactly N Mappers, where N is the number of cores in
>> the cluster. Irregardless of the size of the data. This is not critical if
>> it can't be done, but it can improve the performance of my job if it can be
>> done.
>>
>> Thanks
>> Shai
>>
>>
>> On Thu, Nov 25, 2010 at 9:55 PM, Niels Basjes <ni...@basjes.nl> wrote:
>>
>>> Hi,
>>>
>>> 2010/11/25 Shai Erera <ser...@gmail.com>:
>>> > Is there a way to make MapReduce create exactly N Mappers? More
>>> > specifically, if say my data can be split to 200 Mappers, and I have
>>> only
>>> > 100 cores, how can I ensure only 100 Mappers will be created? The
>>> number of
>>> > cores is not something I know in advance, so writing a special
>>> InputFormat
>>> > might be tricky, unless I can query Hadoop for the available # of cores
>>> (in
>>> > the entire cluster).
>>>
>>> You can configure on a node by node basis how many map and reduce
>>> tasks can be started by the task tracker on that node.
>>> This is done via the conf/mapred-site.xml using these two settings:
>>> mapred.tasktracker.{map|reduce}.tasks.maximum
>>>
>>> Have a look at this page for more information
>>> http://hadoop.apache.org/common/docs/current/cluster_setup.html
>>>
>>> --
>>> Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>

Reply via email to