Re: Stability issue - dead DN's

Joey Echeverria Wed, 11 May 2011 11:51:38 -0700

Which version of hadoop are you running?

I'm pretty sure the problem is you're over committing your RAM. Hadoop
really doesn't like swapping. I would try setting your
mapred.child.java.opts to
-Xmx1024m.


-Joey

On Wed, May 11, 2011 at 2:23 AM, Evert Lammerts <evert.lamme...@sara.nl> wrote:
> Hi list,
>
> I notice that whenever our Hadoop installation is put under a heavy load we 
> lose one or two (on a total of five) datanodes. This results in IOExceptions, 
> and affects the overall performance of the job being run. Can anybody give me 
> advise or best practices on a different configuration to increase the 
> stability? Below I've included the specs of the cluster, the hadoop related 
> config and an example of when which things go wrong. Any help is very much 
> appreciated, and if I can provide any other info please let me know.
>
> Cheers,
> Evert
>
> == What goes wrong, and when ==
>
> See attached a screenshot of Ganglia when the cluster is under load of a 
> single job. This job:
> * reads ~1TB from HDFS
> * writes ~200GB to HDFS
> * runs 288 Mappers and 35 Reducers
>
> When the job runs it takes all available Map and Reduce slots. The system 
> starts swapping and there is a short time interval during which most cores 
> are in WAIT. After that the job really starts running. At around half way, 
> one or two datanodes become unreachable and are marked as dead nodes. The 
> amount of under-replicated blocks becomes huge. Then some 
> "java.io.IOException: Could not obtain block" are thrown in Mappers. The job 
> does manage to finish successfully after around 3.5 hours, but my fear is 
> that when we make the input much larger - which we want - the system becomes 
> too unstable to finish the job.
>
> Maybe worth mentioning - never know what might help diagnostics.  We notice 
> that memory usage becomes less when we switch our keys from Text to 
> LongWritable. Also, the Mappers are done in a fraction of the time. However, 
> this for some reason results in much more network traffic and makes Reducers 
> extremely slow. We're working on figuring out what causes this.
>
>
> == The cluster ==
>
> We have a cluster that consists of 6 Sun Thumpers running Hadoop 0.20.2 on 
> CentOS 5.5. One of them acts as NN and JT, the other 5 run DN's and TT's. 
> Each node has:
> * 16GB RAM
> * 32GB swapspace
> * 4 cores
> * 11 LVM's of 4 x 500GB disks (2TB in total) for HDFS
> * non-HDFS stuff on separate disks
> * a 2x1GE bonded network interface for interconnects
> * a 2x1GE bonded network interface for external access
>
> I realize that this is not a well balanced system, but it's what we had 
> available for a prototype environment. We're working on putting together a 
> specification for a much larger production environment.
>
>
> == Hadoop config ==
>
> Here some properties that I think might be relevant:
>
> __CORE-SITE.XML__
>
> fs.inmemory.size.mb: 200
> mapreduce.task.io.sort.factor: 100
> mapreduce.task.io.sort.mb: 200
> # 1024*1024*4 MB, blocksize of the LVM's
> io.file.buffer.size: 4194304
>
> __HDFS-SITE.XML__
>
> # 1024*1024*4*32 MB, 32 times the blocksize of the LVM's
> dfs.block.size: 134217728
> # Only 5 DN's, but this shouldn't hurt
> dfs.namenode.handler.count: 40
> # This got rid of the occasional "Could not obtain block"'s
> dfs.datanode.max.xcievers: 4096
>
> __MAPRED-SITE.XML__
>
> mapred.tasktracker.map.tasks.maximum: 4
> mapred.tasktracker.reduce.tasks.maximum: 4
> mapred.child.java.opts: -Xmx2560m
> mapreduce.reduce.shuffle.parallelcopies: 20
> mapreduce.map.java.opts: -Xmx512m
> mapreduce.reduce.java.opts: -Xmx512m
> # Compression codecs are configured and seem to work fine
> mapred.compress.map.output: true
> mapred.map.output.compression.codec: com.hadoop.compression.lzo.LzoCodec
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Stability issue - dead DN's

Reply via email to