[ 
https://issues.apache.org/jira/browse/HADOOP-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572790#action_12572790
 ] 

rangadi edited comment on HADOOP-2907 at 2/26/08 8:59 PM:
---------------------------------------------------------------

Edit : typos

I looked at one of these dead datanodes. OutOfMemoryErros seems to be an 
independent problem.  These errors (there are multiple of them) are in .out 
file without timestamps. On this node, .out was modified at 01:39 and log file 
shows DataNode seems to have functioned normally for some more time. 

The datanode seems to stuck because one of its threads is stuck forever waiting 
for 'df' to return while holding a central lock (FSDataset). And there is a 
zombied df process on the machine. The offending stacktrace : 

{noformat}

"[EMAIL PROTECTED]" daemon prio=10 tid=0xae45e800 nid=0x2f3d in Object.wait() 
[0x8cafe000..0x8caff030]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.UNIXProcess$Gate.waitForExit(UNIXProcess.java:64)
        - locked <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:145)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:115)
        at org.apache.hadoop.util.Shell.run(Shell.java:100)
        at org.apache.hadoop.fs.DF.getCapacity(DF.java:63)
        at 
org.apache.hadoop.dfs.FSDataset$FSVolume.getCapacity(FSDataset.java:307)
        at 
org.apache.hadoop.dfs.FSDataset$FSVolume.getAvailable(FSDataset.java:311)
        at 
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:393)
        - locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
        at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:657)
        - locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
        - locked <0xb653aec8> (a org.apache.hadoop.dfs.FSDataset)
        at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983)
        at 
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
        at java.lang.Thread.run(Thread.java:619)
{noformat}

We also need to find why there are mutlipel OutOfMemoryError. My guess is that 
some of the normally functioning datanodes will have these as well.


      was (Author: rangadi):
    I looked at one of these dead datanodes. OutOfMemoryErros seems to be an 
independent problem.  These errors (there are multiple of them) are in .out 
file with out time stamps. On this node, .out was modified at 01:39 and log 
file shows DataNode seems to have functioned normally for some more time. 

The datanode seems to stuck because one of its threads this stuck forever 
waiting for 'df' to return while holding a central lock (FSDataset). And there 
is a zombied df process on the machine. The offending stacktrace : 

{noformat}

"[EMAIL PROTECTED]" daemon prio=10 tid=0xae45e800 nid=0x2f3d in Object.wait() 
[0x8cafe000..0x8caff030]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.UNIXProcess$Gate.waitForExit(UNIXProcess.java:64)
        - locked <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:145)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:115)
        at org.apache.hadoop.util.Shell.run(Shell.java:100)
        at org.apache.hadoop.fs.DF.getCapacity(DF.java:63)
        at 
org.apache.hadoop.dfs.FSDataset$FSVolume.getCapacity(FSDataset.java:307)
        at 
org.apache.hadoop.dfs.FSDataset$FSVolume.getAvailable(FSDataset.java:311)
        at 
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:393)
        - locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
        at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:657)
        - locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
        - locked <0xb653aec8> (a org.apache.hadoop.dfs.FSDataset)
        at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983)
        at 
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
        at java.lang.Thread.run(Thread.java:619)
{noformat}

We also need to find why there are mutlipel OutOfMemoryError. My guess is that 
some of the normally functioning datanodes will have these as well.

  
> dead datanodes because of OutOfMemoryError
> ------------------------------------------
>
>                 Key: HADOOP-2907
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2907
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>
> We see more dead datanodes than in previous releases. The common exception is 
> found in the out file:
> Exception in thread "[EMAIL PROTECTED]" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "DataNode: [dfs.data.dir-value]" 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to