Balancer can hang in PendingBlockMove
-------------------------------------

                 Key: HDFS-871
                 URL: https://issues.apache.org/jira/browse/HDFS-871
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: balancer
    Affects Versions: 0.20.1
         Environment: Yahoo 0.20
            Reporter: Andrew Ryan


We started the balancer, with default options (-threshold 10), and it ran fine 
for a few hours, then hung. The process was still alive but no balancing was 
taking place.

At the time of the hang, jstack showed there were three threads in RUNNABLE 
status. Subsequent jstacks taken minutes and hours later showed the same three 
threads running in the same place, so I don't think this was a case where 
requests were being restarted, it looks like hangs. My best guess is, there's 
no timeout in the request to the namenode for these requests, and there needs 
to be.

I'll attach the full jstack output, but here's a sample thread, they are all 
stuck in the same place.

"pool-1-thread-972" prio=10 tid=0x00002aaafc23a800 nid=0x27a8 runnable [0x00002a
ab0a9a2000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        - locked <0x00002aaaebdbe158> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readShort(DataInputStream.java:295)
        at 
org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.receiveResponse(Balancer.java:371)
        at 
org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.dispatch(Balancer.java:326)
        at 
org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.access$1800(Balancer.java:232)
        at 
org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove$1.run(Balancer.java:393)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to