Strange behavior during Hive queries

Brad Heintz Fri, 11 Sep 2009 14:07:10 -0700

TIA if anyone can point me in the right direction on this.

I'm running a simple Hive query (a count on an external table comprising 436
files, each of ~2GB).  The cluster's mapred-site.xml specifies
mapred.tasktracker.map.tasks.maximum = 7 - that is, 7 mappers per worker
node.  When I run regular MR jobs via "bin/hadoop jar myJob.jar...", I see 7
mappers spawned on each worker.


The problem:  When I run my Hive query, I see 2 mappers spawned per worker.

When I do "set -v;" from the Hive command line, I see
mapred.tasktracker.map.tasks.maximum = 7.

The job.xml for the Hive query shows mapred.tasktracker.map.tasks.maximum =
7.

The only lead I have is that the default for
mapred.tasktracker.map.tasks.maximum is 2, and even though it's overridden
in the cluster's mapred-site.xml I've tried redundanltly overriding this
variable everyplace I can think of (Hive command line with "-hiveconf",
using set from the Hive prompt, et al) and nothing works.  I've combed the
docs & mailing list, but haven't run across the answer.

Does anyone have any ideas what (if anything) I'm missing?  Is this some
quirk of Hive, where it decides that 2 mappers per tasktracker is enough,
and I should just leave it alone?  Or is there some knob I can fiddle to get
it to use my cluster at full power?

Many thanks in advance,
- Brad

-- 
Brad Heintz
[email protected]

Strange behavior during Hive queries

Reply via email to