Hello,

I'm trying to tune terasort on a small cluster (4 identical slave
nodes w/ 4 disks and 16GB RAM each), but I'm having problems with very
uneven load.

For teragen, I specify 24 mappers, but for some reason, only 2 nodes
out of 4 run them all, even though the web UI (for both YARN and HDFS)
shows all 4 nodes available. Similarly, I specify 16 reducers for
terasort, but the reducers seem to run on 3 nodes out of 4. Do I have
something configured wrong, or does the scheduler not attempt to
spread out the load? In addition to performing sub-optimally, this
also causes me to run out of disk space for large jobs, since the data
is not being spread out evenly.

Currently, I'm using these settings (not shown as XML for brevity):

yarn-site.xml:
yarn.nodemanager.resource.memory-mb=13824

mapred-site.xml:
mapreduce.map.memory.mb=768
mapreduce.map.java.opts=-Xmx512M
mapreduce.reduce.memory.mb=2304
mapreduce.reduce.java.opts=-Xmx2048M
mapreduce.task.io.sort.mb=512

In case it's significant, I've scripted the cluster setup and terasort
jobs, so everything runs back-to-back instantly, except that I poll to
ensure that HDFS is up and has active data nodes before running
teragen. I've also tried adding delays, but they didn't seem to have
any effect, so I don't *think* it's a start-up race issue.

Thanks for any advice,
Trevor

Reply via email to