Thanks, I'll take a look On Thu, Nov 25, 2010 at 10:20 PM, Shrijeet Paliwal <shrij...@rocketfuel.com>wrote:
> Shai, > > You will have to implement MultiFileInputFormat > <http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/MultiFileInputFormat.html> > and > set that has your input format. > You may find > http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/examples/MultiFileWordCount.html > useful. > > > On Thu, Nov 25, 2010 at 12:01 PM, Shai Erera <ser...@gmail.com> wrote: > >> I wasn't talking about how to configure the cluster to not invoke more >> than a certain # of Mappers simultaneously. Instead, I'd like to configure a >> (certain) job to invoke exactly N Mappers, where N is the number of cores in >> the cluster. Irregardless of the size of the data. This is not critical if >> it can't be done, but it can improve the performance of my job if it can be >> done. >> >> Thanks >> Shai >> >> >> On Thu, Nov 25, 2010 at 9:55 PM, Niels Basjes <ni...@basjes.nl> wrote: >> >>> Hi, >>> >>> 2010/11/25 Shai Erera <ser...@gmail.com>: >>> > Is there a way to make MapReduce create exactly N Mappers? More >>> > specifically, if say my data can be split to 200 Mappers, and I have >>> only >>> > 100 cores, how can I ensure only 100 Mappers will be created? The >>> number of >>> > cores is not something I know in advance, so writing a special >>> InputFormat >>> > might be tricky, unless I can query Hadoop for the available # of cores >>> (in >>> > the entire cluster). >>> >>> You can configure on a node by node basis how many map and reduce >>> tasks can be started by the task tracker on that node. >>> This is done via the conf/mapred-site.xml using these two settings: >>> mapred.tasktracker.{map|reduce}.tasks.maximum >>> >>> Have a look at this page for more information >>> http://hadoop.apache.org/common/docs/current/cluster_setup.html >>> >>> -- >>> Met vriendelijke groeten, >>> >>> Niels Basjes >>> >> >> >