No, I'm using vanilla 0.20.0. Other, non-Hive jobs are also running with more mappers, so I don't think it'd be that setting even if I had it available.
On Fri, Sep 11, 2009 at 5:28 PM, Todd Lipcon <[email protected]> wrote: > Hrm... sorry, I didn't read your original query closely enough. > > I'm not sure what could be causing this. The map.tasks.maximum parameter > shouldn't affect it at all - it only affects the number of slots on the > trackers. > > By any chance do you have mapred.max.maps.per.node set? This is a > configuration parameter added by HADOOP-5170 - it's not in trunk or the > vanilla 0.18.3 release, but if you're running Cloudera's 0.18.3 release this > parameter could cause the behavior you're seeing. However, it would > certainly not default to 2, so I'd be surprised if that were it. > > -Todd > > > On Fri, Sep 11, 2009 at 2:20 PM, Brad Heintz <[email protected]>wrote: > >> Todd - >> >> Of course; it makes sense that it would be that way. But I'm still left >> wondering why, then, my Hive queries are only using 2 mappers per task >> tracker when other jobs use 7. I've gone so far as to diff the job.xml >> files from a regular job and a Hive query, and didn't turn up anything - >> though clearly, it has to be something Hive is doing. >> >> Thanks, >> - Brad >> >> >> >> On Fri, Sep 11, 2009 at 5:16 PM, Todd Lipcon <[email protected]> wrote: >> >>> Hi Brad, >>> >>> mapred.tasktracker.map.tasks.maximum is a parameter read by the >>> TaskTracker when it starts up. It cannot be changed per-job. >>> >>> Hope that helps >>> -Todd >>> >>> >>> On Fri, Sep 11, 2009 at 2:06 PM, Brad Heintz <[email protected]>wrote: >>> >>>> TIA if anyone can point me in the right direction on this. >>>> >>>> I'm running a simple Hive query (a count on an external table comprising >>>> 436 files, each of ~2GB). The cluster's mapred-site.xml specifies >>>> mapred.tasktracker.map.tasks.maximum = 7 - that is, 7 mappers per worker >>>> node. When I run regular MR jobs via "bin/hadoop jar myJob.jar...", I see >>>> 7 >>>> mappers spawned on each worker. >>>> >>>> The problem: When I run my Hive query, I see 2 mappers spawned per >>>> worker. >>>> >>>> When I do "set -v;" from the Hive command line, I see >>>> mapred.tasktracker.map.tasks.maximum = 7. >>>> >>>> The job.xml for the Hive query shows >>>> mapred.tasktracker.map.tasks.maximum = 7. >>>> >>>> The only lead I have is that the default for >>>> mapred.tasktracker.map.tasks.maximum is 2, and even though it's overridden >>>> in the cluster's mapred-site.xml I've tried redundanltly overriding this >>>> variable everyplace I can think of (Hive command line with "-hiveconf", >>>> using set from the Hive prompt, et al) and nothing works. I've combed the >>>> docs & mailing list, but haven't run across the answer. >>>> >>>> Does anyone have any ideas what (if anything) I'm missing? Is this some >>>> quirk of Hive, where it decides that 2 mappers per tasktracker is enough, >>>> and I should just leave it alone? Or is there some knob I can fiddle to >>>> get >>>> it to use my cluster at full power? >>>> >>>> Many thanks in advance, >>>> - Brad >>>> >>>> -- >>>> Brad Heintz >>>> [email protected] >>>> >>> >>> >> >> >> -- >> Brad Heintz >> [email protected] >> > > -- Brad Heintz [email protected]
