[
https://issues.apache.org/jira/browse/HDFS-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Ryan updated HDFS-871:
-----------------------------
Attachment: balancer-jstack.out
jstack output from hung balancer
> Balancer can hang in PendingBlockMove
> -------------------------------------
>
> Key: HDFS-871
> URL: https://issues.apache.org/jira/browse/HDFS-871
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer
> Affects Versions: 0.20.1
> Environment: Yahoo 0.20
> Reporter: Andrew Ryan
> Attachments: balancer-jstack.out
>
>
> We started the balancer, with default options (-threshold 10), and it ran
> fine for a few hours, then hung. The process was still alive but no balancing
> was taking place.
> At the time of the hang, jstack showed there were three threads in RUNNABLE
> status. Subsequent jstacks taken minutes and hours later showed the same
> three threads running in the same place, so I don't think this was a case
> where requests were being restarted, it looks like hangs. My best guess is,
> there's no timeout in the request to the namenode for these requests, and
> there needs to be.
> I'll attach the full jstack output, but here's a sample thread, they are all
> stuck in the same place.
> "pool-1-thread-972" prio=10 tid=0x00002aaafc23a800 nid=0x27a8 runnable
> [0x00002a
> ab0a9a2000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> - locked <0x00002aaaebdbe158> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readShort(DataInputStream.java:295)
> at
> org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.receiveResponse(Balancer.java:371)
> at
> org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.dispatch(Balancer.java:326)
> at
> org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.access$1800(Balancer.java:232)
> at
> org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove$1.run(Balancer.java:393)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.