Hi Brad, mapred.tasktracker.map.tasks.maximum is a parameter read by the TaskTracker when it starts up. It cannot be changed per-job.
Hope that helps -Todd On Fri, Sep 11, 2009 at 2:06 PM, Brad Heintz <[email protected]> wrote: > TIA if anyone can point me in the right direction on this. > > I'm running a simple Hive query (a count on an external table comprising > 436 files, each of ~2GB). The cluster's mapred-site.xml specifies > mapred.tasktracker.map.tasks.maximum = 7 - that is, 7 mappers per worker > node. When I run regular MR jobs via "bin/hadoop jar myJob.jar...", I see 7 > mappers spawned on each worker. > > The problem: When I run my Hive query, I see 2 mappers spawned per worker. > > When I do "set -v;" from the Hive command line, I see > mapred.tasktracker.map.tasks.maximum = 7. > > The job.xml for the Hive query shows mapred.tasktracker.map.tasks.maximum = > 7. > > The only lead I have is that the default for > mapred.tasktracker.map.tasks.maximum is 2, and even though it's overridden > in the cluster's mapred-site.xml I've tried redundanltly overriding this > variable everyplace I can think of (Hive command line with "-hiveconf", > using set from the Hive prompt, et al) and nothing works. I've combed the > docs & mailing list, but haven't run across the answer. > > Does anyone have any ideas what (if anything) I'm missing? Is this some > quirk of Hive, where it decides that 2 mappers per tasktracker is enough, > and I should just leave it alone? Or is there some knob I can fiddle to get > it to use my cluster at full power? > > Many thanks in advance, > - Brad > > -- > Brad Heintz > [email protected] >
