[
https://issues.apache.org/jira/browse/HADOOP-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592387#action_12592387
]
André Martin commented on HADOOP-3197:
--------------------------------------
I recently identified a "bad" datanode in our cluster - bad in the sense that
the JVM (IBM Java 6 for PPC) on that datanode seemed to "consume" more open
file handles than the "regular" SUN JRE. So this caused a lot of "too many
open files" exceptions where all writers got blocked when this specific
datanode was involved in the pipelining. Maybe this is related to HADOOP-3051
for some JVMs? Takeing out this datanode seemed to have resolved the issue.
> Deadlock in DFCClient
> ---------------------
>
> Key: HADOOP-3197
> URL: https://issues.apache.org/jira/browse/HADOOP-3197
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.16.1
> Reporter: André Martin
>
> The DFS Client hangs - attached the thread dump - looks like a dead lock to
> me...
> {noformat}
> "ResponseProcessor for block blk_-7822837545361798562" prio=10
> tid=0x00002aab993dcc00 nid=0x5241 waiting for monitor entry
> [0x000000004365e000..0x000000004365ecc0]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:1771)
> - waiting to lock <0x00002aaaecf2dd08> (a java.util.LinkedList)
> "DataStreamer for file
> /seDNS/mapred-out/18A59C65A91D44E5BA24785DF103D1781BB0137E.cache.new block
> blk_-7822837545361798562" prio=10 tid=0x00002aab96a46000 nid=0x523f runnable
> [0x000000004345c000..0x000000004345cc40]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(Unknown Source)
> at java.net.SocketOutputStream.write(Unknown Source)
> at java.io.BufferedOutputStream.write(Unknown Source)
> - locked <0x00002aaaecf2ec50> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.write(Unknown Source)
> - locked <0x00002aaaecf2ec20> (a java.io.DataOutputStream)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1623)
> - locked <0x00002aaaecf2dd08> (a java.util.LinkedList)
> "BackupJobQueuesThread" prio=10 tid=0x00002aab94b94000 nid=0x7cb2 waiting for
> monitor entry [0x000000004244c000..0x000000004244cd40]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2117)
> - waiting to lock <0x00002aaaecf2dd08> (a java.util.LinkedList)
> at
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141)
> at
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:124)
> - locked <0x00002aaaecf2e670> (a
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
> at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:58)
> - locked <0x00002aaaecf2e670> (a
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:36)
> at java.io.DataOutputStream.writeBytes(Unknown Source)
> at
> sedns.serializer.file.FileSerializerServer.serializeJobQueuesAndCache(FileSerializerServer.java:723)
> - locked <0x00002aaab430fec8> (a java.util.Collections$SynchronizedSet)
> at
> sedns.pastry.application.ServerApp$BackupJobListThread.run(ServerApp.java:476)
> "[EMAIL PROTECTED]" daemon prio=10 tid=0x00002aab94bc7c00 nid=0x7ca7 waiting
> on condition [0x0000000041941000..0x0000000041941bc0]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:597)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.