Which version of hadoop are you running? I'm pretty sure the problem is you're over committing your RAM. Hadoop really doesn't like swapping. I would try setting your mapred.child.java.opts to -Xmx1024m.
-Joey On Wed, May 11, 2011 at 2:23 AM, Evert Lammerts <evert.lamme...@sara.nl> wrote: > Hi list, > > I notice that whenever our Hadoop installation is put under a heavy load we > lose one or two (on a total of five) datanodes. This results in IOExceptions, > and affects the overall performance of the job being run. Can anybody give me > advise or best practices on a different configuration to increase the > stability? Below I've included the specs of the cluster, the hadoop related > config and an example of when which things go wrong. Any help is very much > appreciated, and if I can provide any other info please let me know. > > Cheers, > Evert > > == What goes wrong, and when == > > See attached a screenshot of Ganglia when the cluster is under load of a > single job. This job: > * reads ~1TB from HDFS > * writes ~200GB to HDFS > * runs 288 Mappers and 35 Reducers > > When the job runs it takes all available Map and Reduce slots. The system > starts swapping and there is a short time interval during which most cores > are in WAIT. After that the job really starts running. At around half way, > one or two datanodes become unreachable and are marked as dead nodes. The > amount of under-replicated blocks becomes huge. Then some > "java.io.IOException: Could not obtain block" are thrown in Mappers. The job > does manage to finish successfully after around 3.5 hours, but my fear is > that when we make the input much larger - which we want - the system becomes > too unstable to finish the job. > > Maybe worth mentioning - never know what might help diagnostics. We notice > that memory usage becomes less when we switch our keys from Text to > LongWritable. Also, the Mappers are done in a fraction of the time. However, > this for some reason results in much more network traffic and makes Reducers > extremely slow. We're working on figuring out what causes this. > > > == The cluster == > > We have a cluster that consists of 6 Sun Thumpers running Hadoop 0.20.2 on > CentOS 5.5. One of them acts as NN and JT, the other 5 run DN's and TT's. > Each node has: > * 16GB RAM > * 32GB swapspace > * 4 cores > * 11 LVM's of 4 x 500GB disks (2TB in total) for HDFS > * non-HDFS stuff on separate disks > * a 2x1GE bonded network interface for interconnects > * a 2x1GE bonded network interface for external access > > I realize that this is not a well balanced system, but it's what we had > available for a prototype environment. We're working on putting together a > specification for a much larger production environment. > > > == Hadoop config == > > Here some properties that I think might be relevant: > > __CORE-SITE.XML__ > > fs.inmemory.size.mb: 200 > mapreduce.task.io.sort.factor: 100 > mapreduce.task.io.sort.mb: 200 > # 1024*1024*4 MB, blocksize of the LVM's > io.file.buffer.size: 4194304 > > __HDFS-SITE.XML__ > > # 1024*1024*4*32 MB, 32 times the blocksize of the LVM's > dfs.block.size: 134217728 > # Only 5 DN's, but this shouldn't hurt > dfs.namenode.handler.count: 40 > # This got rid of the occasional "Could not obtain block"'s > dfs.datanode.max.xcievers: 4096 > > __MAPRED-SITE.XML__ > > mapred.tasktracker.map.tasks.maximum: 4 > mapred.tasktracker.reduce.tasks.maximum: 4 > mapred.child.java.opts: -Xmx2560m > mapreduce.reduce.shuffle.parallelcopies: 20 > mapreduce.map.java.opts: -Xmx512m > mapreduce.reduce.java.opts: -Xmx512m > # Compression codecs are configured and seem to work fine > mapred.compress.map.output: true > mapred.map.output.compression.codec: com.hadoop.compression.lzo.LzoCodec > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434