Re: Idle nodes with terasort and MRv2/YARN (0.23.1)

Trevor Fri, 29 Jun 2012 15:49:01 -0700

Thanks, Arun. Switching to CapacityScheduler seems to have fixed much of
the issue: TeraGen and TeraSort are now evenly distributed and run almost
twice as fast. However, TeraValidate only ran on one node, leaving 3
completely idle (except for the AM). I browsed the block locations of the
output partitions and they seem to be pretty evenly distributed. Any idea
why all 17 containers (16 map, 1 reduce) would run on a single node for
this job? I left the minimum container size at 768 MB (the mapper size), so
this allocation is valid for my 16 GB machine, but I don't know why the CS
didn't distribute this job like it did the others.


BTW, I just noticed
MAPREDUCE-4335<https://issues.apache.org/jira/browse/MAPREDUCE-4335>(Change
the default scheduler to the CapacityScheduler), so it looks like
this will be fixed by default in the next release.

I'll keep you posted on the ARM work. Just keeping it building is a bit of
work (e.g. HADOOP-8538), but now that I'm getting a handle on x86
performance tuning with MRv2, I should have some ARM results soon.

-Trevor

On Tue, Jun 5, 2012 at 7:35 AM, Arun C Murthy <a...@hortonworks.com> wrote:

> Trevor,
>
> On May 30, 2012, at 1:38 PM, Trevor Robinson wrote:
>
> Jeff:
>
> Thanks for the corroboration and advice. I can't retreat to 0.20, and
> must forge ahead with 2.0, so I'll share any progress.
>
> Arun:
>
> I haven't set the minimum container size. Do you know the default? Is
> there a way to easily find defaults (more complete/reliable than
> docs)?
>
>
> The default is way too low (128M) which is non-optimal.
>
> I've opened https://issues.apache.org/jira/browse/MAPREDUCE-4316 to fix
> it.
>
>
> Thanks, I'll give that a try. How does minimum container size relate
> to settings mapreduce.map.memory.mb? Would it essentially raise my
> 768M map memory allocation to 1G? Does CapacityScheduler need any
> additional configuration to function optimally? BTW, what is the
> default scheduler?
>
>
> The min. container size should be >= mapreduce.map.memory.mb.
>
> If you have a heap size of 768M for your maps, it should be perfectly ok
> to have min-container size as 1024M.
>
> Hmm... I'd also look into bumping mapreduce.reduce.memory.mb to be
> 2*min-container-size i.e. use 2 containers.
>
> The default scheduler is FifoScheduler which isn't as mature as CS (which
> is what we use for tuning etc., see
> http://hortonworks.com/blog/delivering-on-hadoop-next-benchmarking-performance/for
>  more details).
>
>
> I should have MAPREDUCE-3641. I'm using 0.23.1 with CDH4b2 patches
> (and a few Java 7/Ubuntu 12.04 build patches). How does 2.0.0-alpha
> compare to 0.23.1?
>
>
> I'm not familiar with patchsets on CDH, but hadoop-2.0.0-alpha is a
> significant improvement on hadoop-0.23.1...
>
> If there's anything I can do to assist with the issue of spreading out
> map tasks, please let me know. Is there a JIRA issue for it (or if
> not, should there be)?
>
>
> For smaller clusters 
> https://issues.apache.org/jira/browse/MAPREDUCE-3210could help.
>
> Incidentally, my current benchmarking work on x86 is only a training
> ground and baseline before moving onto ARM-based systems, which have
> 4GB RAM and generally fewer, smaller (2.5" form factor) disks per
> node. It sounds like the smaller RAM will force better distribution,
> but the disk capacity/utilization situation will be more severe.
>
>
> Right, smaller RAM should force better distribution.
>
> Love to get more f/b from your ARM work, thanks!
>
> Arun
>
> Thanks,
> Trevor
>
> On Tue, May 29, 2012 at 6:21 PM, Arun C Murthy <a...@hortonworks.com>
> wrote:
>
> What is the minimum container size? i.e.
>
> yarn.scheduler.minimum-allocation-mb.
>
>
> I'd bump it up to at least 1G and use the CapacityScheduler for performance
>
> tests:
>
>
> http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>
>
> In case of teragen, the job has no locality at all (since it's just
>
> generating data from 'random' input-splits) and hence you are getting them
>
> stuck on fewer nodes since you have so many containers on each node.
>
>
> The reduces should be better spread if you are using CapacityScheduler and
>
> have https://issues.apache.org/jira/browse/MAPREDUCE-3641 in your build
> i.e.
>
> hadoop-0.23.1 or hadoop-2.0.0-alpha (I'd use the latter).
>
>
> Also, FYI, currently the CS makes the tradeoff that node-locality is almost
>
> same as rack-locality and hence you might see maps not spread out for
>
> terasort. I'll fix that one soon.
>
>
> hth,
>
> Arun
>
>
> On May 29, 2012, at 2:33 PM, Trevor Robinson wrote:
>
>
> Hello,
>
>
> I'm trying to tune terasort on a small cluster (4 identical slave
>
> nodes w/ 4 disks and 16GB RAM each), but I'm having problems with very
>
> uneven load.
>
>
> For teragen, I specify 24 mappers, but for some reason, only 2 nodes
>
> out of 4 run them all, even though the web UI (for both YARN and HDFS)
>
> shows all 4 nodes available. Similarly, I specify 16 reducers for
>
> terasort, but the reducers seem to run on 3 nodes out of 4. Do I have
>
> something configured wrong, or does the scheduler not attempt to
>
> spread out the load? In addition to performing sub-optimally, this
>
> also causes me to run out of disk space for large jobs, since the data
>
> is not being spread out evenly.
>
>
> Currently, I'm using these settings (not shown as XML for brevity):
>
>
> yarn-site.xml:
>
> yarn.nodemanager.resource.memory-mb=13824
>
>
> mapred-site.xml:
>
> mapreduce.map.memory.mb=768
>
> mapreduce.map.java.opts=-Xmx512M
>
> mapreduce.reduce.memory.mb=2304
>
> mapreduce.reduce.java.opts=-Xmx2048M
>
> mapreduce.task.io.sort.mb=512
>
>
> In case it's significant, I've scripted the cluster setup and terasort
>
> jobs, so everything runs back-to-back instantly, except that I poll to
>
> ensure that HDFS is up and has active data nodes before running
>
> teragen. I've also tried adding delays, but they didn't seem to have
>
> any effect, so I don't *think* it's a start-up race issue.
>
>
> Thanks for any advice,
>
> Trevor
>
>
>
> --
>
> Arun C. Murthy
>
> Hortonworks Inc.
>
> http://hortonworks.com/
>
>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Idle nodes with terasort and MRv2/YARN (0.23.1)

Reply via email to