Hi,
I'm commissioning a new Hadoop cluster with the following spec.
45 x data nodes:
- 2 x Quad-Core AMD Opteron(tm) Processor 2378
- 16GB ram
- 4 x WDC WD1002FBYS 1TB SATA drives (configured as separate ext4
filesystems)
3 x name nodes:
- 2 x Quad-Core AMD Opteron(tm) Processor 2378
- 32GB ram
- 2 x WDC WD1002FBYS 1TB SATA drives (in software RAID1 config and ext4
filesystem)
All nodes are running Debian testing/squeeze.
I'm doing my benchmarking with TeraSort running as follows
hadoop jar hadoop-0.20.2-examples.jar teragen -Dmapred.map.tasks=8000
10000000000 /terasort/in
hadoop jar hadoop-0.20.2-examples.jar terasort -Dmapred.reduce.tasks=530
/terasort/in /terasort/out
When I run this on the Debian 2.6.30 kernel - it runs to completion in
about 23 minutes (occasionally running into the cpu soft lockups
problems described in [1]). I assume that is a reasonable time for this
benchmark to complete in?
When I run this on the Debian 2.6.32 kernel - over the course of the
run, 1 or 2 datanodes of the cluster enter a state whereby they are no
longer responsive to network traffic.
Logging into these nodes via the console reveals no messages in the
log-files. Running ifdown eth0 followed by ifup eth0 brings these
systems back online. The systems that become unresponsive vary from run
to run suggesting this is not a h/w problem specific to certain nodes.
I have raised this issue with the Debian kernel team[2] and have tested
various system and switch changes in an attempt to identify the cause -
but without success.
Has anyone run into similar problems with their environments? I noticed
that the when the nodes become unresponsive, it often happens when the
TeraSort is at
map 100%, reduce 78%
Is there any significance to that?
Any feedback welcome (including comments on what distro/kernel
combinations others are using).
Thanks,
-stephen
[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556030
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572201
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com