Trevor, On May 30, 2012, at 1:38 PM, Trevor Robinson wrote:
> Jeff: > > Thanks for the corroboration and advice. I can't retreat to 0.20, and > must forge ahead with 2.0, so I'll share any progress. > > Arun: > > I haven't set the minimum container size. Do you know the default? Is > there a way to easily find defaults (more complete/reliable than > docs)? The default is way too low (128M) which is non-optimal. I've opened https://issues.apache.org/jira/browse/MAPREDUCE-4316 to fix it. > > Thanks, I'll give that a try. How does minimum container size relate > to settings mapreduce.map.memory.mb? Would it essentially raise my > 768M map memory allocation to 1G? Does CapacityScheduler need any > additional configuration to function optimally? BTW, what is the > default scheduler? The min. container size should be >= mapreduce.map.memory.mb. If you have a heap size of 768M for your maps, it should be perfectly ok to have min-container size as 1024M. Hmm... I'd also look into bumping mapreduce.reduce.memory.mb to be 2*min-container-size i.e. use 2 containers. The default scheduler is FifoScheduler which isn't as mature as CS (which is what we use for tuning etc., see http://hortonworks.com/blog/delivering-on-hadoop-next-benchmarking-performance/ for more details). > > I should have MAPREDUCE-3641. I'm using 0.23.1 with CDH4b2 patches > (and a few Java 7/Ubuntu 12.04 build patches). How does 2.0.0-alpha > compare to 0.23.1? > I'm not familiar with patchsets on CDH, but hadoop-2.0.0-alpha is a significant improvement on hadoop-0.23.1... > If there's anything I can do to assist with the issue of spreading out > map tasks, please let me know. Is there a JIRA issue for it (or if > not, should there be)? > For smaller clusters https://issues.apache.org/jira/browse/MAPREDUCE-3210 could help. > Incidentally, my current benchmarking work on x86 is only a training > ground and baseline before moving onto ARM-based systems, which have > 4GB RAM and generally fewer, smaller (2.5" form factor) disks per > node. It sounds like the smaller RAM will force better distribution, > but the disk capacity/utilization situation will be more severe. > Right, smaller RAM should force better distribution. Love to get more f/b from your ARM work, thanks! Arun > Thanks, > Trevor > > On Tue, May 29, 2012 at 6:21 PM, Arun C Murthy <a...@hortonworks.com> wrote: >> What is the minimum container size? i.e. >> yarn.scheduler.minimum-allocation-mb. >> >> I'd bump it up to at least 1G and use the CapacityScheduler for performance >> tests: >> http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html >> >> In case of teragen, the job has no locality at all (since it's just >> generating data from 'random' input-splits) and hence you are getting them >> stuck on fewer nodes since you have so many containers on each node. >> >> The reduces should be better spread if you are using CapacityScheduler and >> have https://issues.apache.org/jira/browse/MAPREDUCE-3641 in your build i.e. >> hadoop-0.23.1 or hadoop-2.0.0-alpha (I'd use the latter). >> >> Also, FYI, currently the CS makes the tradeoff that node-locality is almost >> same as rack-locality and hence you might see maps not spread out for >> terasort. I'll fix that one soon. >> >> hth, >> Arun >> >> On May 29, 2012, at 2:33 PM, Trevor Robinson wrote: >> >> Hello, >> >> I'm trying to tune terasort on a small cluster (4 identical slave >> nodes w/ 4 disks and 16GB RAM each), but I'm having problems with very >> uneven load. >> >> For teragen, I specify 24 mappers, but for some reason, only 2 nodes >> out of 4 run them all, even though the web UI (for both YARN and HDFS) >> shows all 4 nodes available. Similarly, I specify 16 reducers for >> terasort, but the reducers seem to run on 3 nodes out of 4. Do I have >> something configured wrong, or does the scheduler not attempt to >> spread out the load? In addition to performing sub-optimally, this >> also causes me to run out of disk space for large jobs, since the data >> is not being spread out evenly. >> >> Currently, I'm using these settings (not shown as XML for brevity): >> >> yarn-site.xml: >> yarn.nodemanager.resource.memory-mb=13824 >> >> mapred-site.xml: >> mapreduce.map.memory.mb=768 >> mapreduce.map.java.opts=-Xmx512M >> mapreduce.reduce.memory.mb=2304 >> mapreduce.reduce.java.opts=-Xmx2048M >> mapreduce.task.io.sort.mb=512 >> >> In case it's significant, I've scripted the cluster setup and terasort >> jobs, so everything runs back-to-back instantly, except that I poll to >> ensure that HDFS is up and has active data nodes before running >> teragen. I've also tried adding delays, but they didn't seem to have >> any effect, so I don't *think* it's a start-up race issue. >> >> Thanks for any advice, >> Trevor >> >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/