Have a look at CombineFileInputFormat - it can process multiple splits per map.
Tom On Fri, Jan 15, 2010 at 4:11 PM, Clements, Michael <michael.cleme...@disney.com> wrote: > I've been exploring the same question lately: capping max simultaneous > tasks per node. > > A file split approach would work, though it may be an indirect way of > doing it. > > In many cases it would be cleaner and much easier to have a max task cap > setting, for example this could/should be configurable in a Fair > Scheduler pool setting. > > But there currently doesn't exist in Hadoop any simple means (that I > know of) to set a max cap on tasks per machine, for a specific job (or > pool of jobs). You have the configured setting, which is applied > globally. If one or a few specific jobs need a different max, you're > stuck. > > So the file split size approach, while indirect and more complex than a > config setting, is the only one that I know of. > > The question actually has some subtlety because there is the total # of > tasks for the job, and the # that will run simultaneously. In some > cases, it's OK if there are a lot of tasks, so long as only 1 (or some > other max cap) at a time runs per machine. In other cases, you need to > limit the total # of tasks regardless of how many run simultaneously. > The file split approach will control the total # of tasks for the job, > which may impact (directly or indirectly) the # that run simultaneously. > > -----Original Message----- > From: > mapreduce-dev-return-1367-michael.clements=disney....@hadoop.apache.org > [mailto:mapreduce-dev-return-1367-michael.clements=disney....@hadoop.apa > che.org] On Behalf Of Allen Wittenauer > Sent: Friday, January 15, 2010 4:00 PM > To: mapreduce-dev@hadoop.apache.org > Subject: Re: why one mapper process per block? > > > > > On 1/15/10 3:55 PM, "Erez Katz" <erez_k...@yahoo.com> wrote: >> What would it take to pipe ALL the blocks that are part of the input > set, on >> a given node, to ONE mapper process? > > Probably just setting mapred.min.split.size to a high enough value. > >