[
https://issues.apache.org/jira/browse/HADOOP-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572790#action_12572790
]
rangadi edited comment on HADOOP-2907 at 2/26/08 8:59 PM:
---------------------------------------------------------------
Edit : typos
I looked at one of these dead datanodes. OutOfMemoryErros seems to be an
independent problem. These errors (there are multiple of them) are in .out
file without timestamps. On this node, .out was modified at 01:39 and log file
shows DataNode seems to have functioned normally for some more time.
The datanode seems to stuck because one of its threads is stuck forever waiting
for 'df' to return while holding a central lock (FSDataset). And there is a
zombied df process on the machine. The offending stacktrace :
{noformat}
"[EMAIL PROTECTED]" daemon prio=10 tid=0xae45e800 nid=0x2f3d in Object.wait()
[0x8cafe000..0x8caff030]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
at java.lang.Object.wait(Object.java:485)
at java.lang.UNIXProcess$Gate.waitForExit(UNIXProcess.java:64)
- locked <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:145)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:115)
at org.apache.hadoop.util.Shell.run(Shell.java:100)
at org.apache.hadoop.fs.DF.getCapacity(DF.java:63)
at
org.apache.hadoop.dfs.FSDataset$FSVolume.getCapacity(FSDataset.java:307)
at
org.apache.hadoop.dfs.FSDataset$FSVolume.getAvailable(FSDataset.java:311)
at
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:393)
- locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:657)
- locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
- locked <0xb653aec8> (a org.apache.hadoop.dfs.FSDataset)
at
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
at java.lang.Thread.run(Thread.java:619)
{noformat}
We also need to find why there are mutlipel OutOfMemoryError. My guess is that
some of the normally functioning datanodes will have these as well.
was (Author: rangadi):
I looked at one of these dead datanodes. OutOfMemoryErros seems to be an
independent problem. These errors (there are multiple of them) are in .out
file with out time stamps. On this node, .out was modified at 01:39 and log
file shows DataNode seems to have functioned normally for some more time.
The datanode seems to stuck because one of its threads this stuck forever
waiting for 'df' to return while holding a central lock (FSDataset). And there
is a zombied df process on the machine. The offending stacktrace :
{noformat}
"[EMAIL PROTECTED]" daemon prio=10 tid=0xae45e800 nid=0x2f3d in Object.wait()
[0x8cafe000..0x8caff030]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
at java.lang.Object.wait(Object.java:485)
at java.lang.UNIXProcess$Gate.waitForExit(UNIXProcess.java:64)
- locked <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:145)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:115)
at org.apache.hadoop.util.Shell.run(Shell.java:100)
at org.apache.hadoop.fs.DF.getCapacity(DF.java:63)
at
org.apache.hadoop.dfs.FSDataset$FSVolume.getCapacity(FSDataset.java:307)
at
org.apache.hadoop.dfs.FSDataset$FSVolume.getAvailable(FSDataset.java:311)
at
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:393)
- locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:657)
- locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
- locked <0xb653aec8> (a org.apache.hadoop.dfs.FSDataset)
at
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
at java.lang.Thread.run(Thread.java:619)
{noformat}
We also need to find why there are mutlipel OutOfMemoryError. My guess is that
some of the normally functioning datanodes will have these as well.
> dead datanodes because of OutOfMemoryError
> ------------------------------------------
>
> Key: HADOOP-2907
> URL: https://issues.apache.org/jira/browse/HADOOP-2907
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Christian Kunz
>
> We see more dead datanodes than in previous releases. The common exception is
> found in the out file:
> Exception in thread "[EMAIL PROTECTED]" java.lang.OutOfMemoryError: Java heap
> space
> Exception in thread "DataNode: [dfs.data.dir-value]"
> java.lang.OutOfMemoryError: Java heap space
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.