[ 
https://issues.apache.org/jira/browse/HADOOP-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637379#action_12637379
 ] 

Meng Mao commented on HADOOP-3466:
----------------------------------

I was able to increase the size of my input data (while maintaining a very 
large dfs.datanode.max.xcievers) until a new kind of error cropped up. While 
the number of datanode threads does not exceed the limit, a similar DOS-style 
crash happens to the grid. This time, some of the nodes' datanode processes die 
and take all the child threads with them. Is it safe to assume that the JVM is 
running out of memory trying to open up and maintain so many threads? Do you 
guys know what is "beyond the wall," as far as how Hadoop will fail if 
dfs.datanode.max.xcievers is jacked up arbitrarily high?

We had really hoped that, hack as it might be, that raising 
dfs.datanode.max.xcievers would be sufficient to cover the upper bound of the 
data we'd expect to see in a scaled out deployment of our grid, but the data 
set we used is not really pushing we expect to get. 

Regardless of whether our implementation is too naive and should be revised, 
does our particular problem case lend more weight to the need to do a rewrite 
as discussed in issue 3856? Or is the general stance to suck it up and work 
around the limit?

We greatly appreciate any input you guys have, as this is seriously holding up 
our project.

> DataNode fails to deliver blocks, holds thousands of open socket connections
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3466
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3466
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0, 0.18.0
>         Environment: Hadoop-0.18.0, 7 node Linux grid (6 DataNodes, 1 master 
> node)
> Hadoop-0.18.0, 20 EC2 Linux grid (19 DataNodes, 1 master node)
>            Reporter: Christopher Gillett
>            Priority: Critical
>         Attachments: logs.tar.gz, logsforHadoopTeam.tar.gz
>
>
> 9/27 update: uploaded the logs, with hopefully all the bits that should be 
> examined. If other things are needed, just let me know. Note that all the 
> paths refer to 0.18.1. This is still an 18.0 installation using the 18.0 core 
> jar, just installed to a non-standard location.
> 9/26 update: we have successfully reproduced this using Hadoop 0.18 as well. 
> The problem happens on both our own network infrastructure as well as on an 
> Amazon EC2 cluster running CentOS5 images. I'll be attaching the logs Raghu 
> asked for shortly.
> A job that used to run correctly on our grid (in 0.15.0) now fails. The 
> failure occurs after the map phase is complete, and about 2/3rds of the way 
> through the reduce phase.   This job is processing a modest amount of input 
> data (approximately 220G)
> When the error occurs the nodes hosting DataNodes have literally thousands of 
> open socket connections on them.  The DataNode instances are holding large 
> amounts of memory.  Sometimes the DataNodes crash or exit, other times they 
> continue to run.
> The error which gets kicked out from the application perspective is:
> 08/05/27 11:30:08 INFO mapred.JobClient: map 100% reduce 89%
> 08/05/27 11:30:41 INFO mapred.JobClient: map 100% reduce 90%
> 08/05/27 11:32:45 INFO mapred.JobClient: map 100% reduce 86%
> 08/05/27 11:32:45 INFO mapred.JobClient: Task Id :
>  task_200805271056_0001_r_000007_0, Status : FAILED
> java.io.IOException: Could not get block locations. Aborting...
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanode
>  Error(DFSClient.java:1832)
> at
>  
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1487)
> at
>  
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1579)
> I then discovered that 1 or more DataNode instances on the slave nodes
>  are down (we run 1 DataNode instance per machine). The cause for at
>  least some of the DataNode failures is a JVM internal error that gets
>  raised due to a complete out-of-memory scenario (on a 4G, 4-way machine). 
> Watching the DataNodes run, I can see them consuming more and more
>  memory. For those failures for which there is a JVM traceback, I see (in
>  part...NOTE 0.16.4 TRACEBACK):
> #
> # java.lang.OutOfMemoryError: requested 16 bytes for CHeapObj-new. Out
>  of swap space?
> #
> # Internal Error (414C4C4F434154494F4E0E494E4C494E450E4850500017),
>  pid=4246, tid=2283883408
> #
> # Java VM: Java HotSpot(TM) Server VM (1.6.0_02-b05 mixed mode)
> # If you would like to submit a bug report, please visit:
> # http://java.sun.com/webapps/bugreport/crash.jsp
> #
> --------------- T H R E A D ---------------
> Current thread (0x8a942000): JavaThread
>  "[EMAIL PROTECTED]" daemon [_thread_in_Java, id=15064]
> Stack: [0x881c4000,0x88215000), sp=0x882139e0, free space=318k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>  C=native code)
> V [libjvm.so+0x53b707]
> V [libjvm.so+0x225fe1]
> V [libjvm.so+0x16fdc5]
> V [libjvm.so+0x22aef3]
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v blob 0xf4f235a7
> J java.io.DataInputStream.readInt()I
> j
>  
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(Ljava/io/DataOutputStream;Ljava/io/DataInputStream;Ljava/io/DataOutputStream;Ljava/lang/String;Lorg/a
> pache/hadoop/dfs/DataNode$Throttler;I)V+126
> j
>  
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(Ljava/io/DataInputStream;)V+746
> j org.apache.hadoop.dfs.DataNode$DataXceiver.run()V+174
> j java.lang.Thread.run()V+11
> v ~StubRoutines::call_stub
> --------------- P R O C E S S ---------------
> Java Threads: ( => current thread )
> 0x0ae3f400 JavaThread "process reaper" daemon [_thread_blocked,
>  id=26870]
> 0x852e6000 JavaThread
>  "[EMAIL PROTECTED]" daemon [_thread_in_vm, id=26869]
> 0x08a1cc00 JavaThread "PacketResponder 0 for Block
>  blk_-6186975972786687394" daemon [_thread_blocked, id=26769]
> 0x852e5000 JavaThread
>  "[EMAIL PROTECTED]" daemon [_thread_in_native, id=26768]
> 0x0956e000 JavaThread "PacketResponder 0 for Block
>  blk_-2322514873363546651" daemon [_thread_blocked, id=26767]
> 0x852e4400 JavaThread
>  "[EMAIL PROTECTED]" daemon [_thread_in_native, id=26766]
> 0x09d3a400 JavaThread "PacketResponder 0 for Block
>  blk_8926941945313450801" daemon [_thread_blocked, id=26764]
> 0x852e3c00 JavaThread
>  "[EMAIL PROTECTED]" daemon [_thread_in_native, id=26763]
> 0x0953d000 JavaThread "PacketResponder 0 for Block
>  blk_4785883052769066976" daemon [_thread_blocked, id=26762]
> 0xb13a5c00 JavaThread
>  "[EMAIL PROTECTED]" daemon [_thread_in_native, id=26761]
> The interesting part here is that if I count the number of JavaThreads
>  running org.apache.hadoop.dfs.DataNode I see 4,538 (!) in the
>  traceback. The number of threads was surprising.
> Other DataNodes just exit without panicking the JVM. In either failure
>  mode, the last few lines of the DataNode log file is apparently
>  innocuous:
> 2008-05-27 11:31:47,663 INFO org.apache.hadoop.dfs.DataNode: Datanode 2
>  got response for connect ack from downstream datanode with
>  firstbadlink as
> 2008-05-27 11:31:47,663 INFO org.apache.hadoop.dfs.DataNode: Datanode 2
>  forwarding connect ack to upstream firstbadlink is
> 2008-05-27 11:31:48,268 INFO org.apache.hadoop.dfs.DataNode: Receiving
>  block blk_-2241766430103062484 src: /10.2.14.10:33626 dest:
>  /10.2.14.10:50010
> 2008-05-27 11:31:48,740 INFO org.apache.hadoop.dfs.DataNode: Receiving
>  block blk_313239508245918539 src: /10.2.14.24:37836 dest:
>  /10.2.14.24:50010
> 2008-05-27 11:31:48,740 INFO org.apache.hadoop.dfs.DataNode: Datanode 0
>  forwarding connect ack to upstream firstbadlink is
> 2008-05-27 11:31:49,044 INFO org.apache.hadoop.dfs.DataNode: Receiving
>  block blk_1684581399908730353 src: /10.2.14.16:51605 dest:
>  /10.2.14.16:50010
> 2008-05-27 11:31:49,044 INFO org.apache.hadoop.dfs.DataNode: Datanode 0
>  forwarding connect ack to upstream firstbadlink is
> 2008-05-27 11:31:49,509 INFO org.apache.hadoop.dfs.DataNode: Receiving
>  block blk_2493969670086107736 src: /10.2.14.18:47557 dest:
>  /10.2.14.18:50010
> 2008-05-27 11:31:49,513 INFO org.apache.hadoop.dfs.DataNode: Datanode 1
>  got response for connect ack from downstream datanode with
>  firstbadlink as
> 2008-05-27 11:31:49,513 INFO org.apache.hadoop.dfs.DataNode: Datanode 1
>  forwarding connect ack to upstream firstbadlink is
> Finally, the task-level output (in userlogs) doesn't reveal much
>  either:
> 2008-05-27 11:38:30,724 INFO org.apache.hadoop.mapred.ReduceTask:
>  task_200805271056_0001_r_000007_1 Need 34 map output(s)
> 2008-05-27 11:38:30,753 INFO org.apache.hadoop.mapred.ReduceTask:
>  task_200805271056_0001_r_000007_1 done copying
>  task_200805271056_0001_m_001976_0 output from worker9.
> 2008-05-27 11:38:31,727 INFO org.apache.hadoop.mapred.ReduceTask:
>  task_200805271056_0001_r_000007_1: Got 0 new map-outputs & 0 obsolete
>  map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-05-27 11:38:31,727 INFO org.apache.hadoop.mapred.ReduceTask:
>  task_200805271056_0001_r_000007_1 Got 33 known map output location(s);
>  scheduling...
> 2008-05-27 11:38:31,727 INFO org.apache.hadoop.mapred.ReduceTask:
>  task_200805271056_0001_r_000007_1 Scheduled 1 of 33 known outputs (0 slow
>  hosts and 32 dup hosts)
> 2008-05-27 11:38:31,727 INFO org.apache.hadoop.mapred.ReduceTask:
>  task_200805271056_0001_r_000007_1 Copying task_200805271056_0001_m_001248_0
>  output from worker8.
> 2008-05-27 11:38:31,727 INFO org.apache.hadoop.mapred.ReduceTask:
>  task_200805271056_0001_r_000007_1 Need 33 map output(s)
> 2008-05-27 11:38:31,752 INFO org.apache.hadoop.mapred.ReduceTask:
>  task_200805271056_0001_r_000007_1 done copying
>  task_200805271056_0001_m_001248_0 output from worker8.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to