Hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite
-----------------------------------------------------------------------

                 Key: HADOOP-5904
                 URL: https://issues.apache.org/jira/browse/HADOOP-5904
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.19.1, 0.19.0, 0.18.3
            Reporter: stack


We've seen this hang rare enough but when it happens it locks up the 
application.  We've seen it at least in 0.18.x and 0.19.x (we don't have much 
experience with 0.20.x hdfs yet).

Here we're doing a sequencefile#append

{code}
"IPC Server handler 9 on 60020" daemon prio=10 tid=0x00007fef1c3f0400 
nid=0x7470 waiting for monitor entry [0x0000000042d18000..0x0000000042d189f0]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2486)
        - waiting to lock <0x00007fef38ecc138> (a java.util.LinkedList)
        - locked <0x00007fef38ecbdb8> (a 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
        at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
        - locked <0x00007fef38ecbdb8> (a 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
        - locked <0x00007fef38ecbdb8> (a 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
        - locked <0x00007fef38ecbdb8> (a 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        - locked <0x00007fef38e09fc0> (a 
org.apache.hadoop.fs.FSDataOutputStream)
        at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016)
        - locked <0x00007fef38e09f30> (a 
org.apache.hadoop.io.SequenceFile$Writer)
        at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:980)
        - locked <0x00007fef38e09f30> (a 
org.apache.hadoop.io.SequenceFile$Writer)
        at org.apache.hadoop.hbase.regionserver.HLog.doWrite(HLog.java:461)
        at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:421)
        - locked <0x00007fef29ad9588> (a java.lang.Integer)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.update(HRegion.java:1676)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1378)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1184)
        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:622)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
{code}

The DataStreamer that is supposed to servicing the above writeChunk is stuck 
here:

{code}
"DataStreamer for file 
/hbase/log_72.34.249.212_1225407466779_60020/hlog.dat.1227075571390 block 
blk_-7436808403424765554_553837" daemon prio=10 tid=0x0000000001c84c00 
nid=0x7125 in Object.wait() [0x00000000409b3000..0x00000000409b3d70]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.hadoop.ipc.Client.call(Client.java:709)
        - locked <0x00007fef39520bb8> (a org.apache.hadoop.ipc.Client$Call)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
        at 
org.apache.hadoop.dfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:139)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2185)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
        - locked <0x00007fef38ecc138> (a java.util.LinkedList)
{code}

The writeChunk is trying to synchronize on dataQueue.

DataQueue is held by DataStreamer#run which is down in processDatanodeError 
trying to recover a problem with a block.

Another example of the hang and some more detail can be found over in HBASE-667.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to