[
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853559#comment-15853559
]
Hudson commented on HDFS-11377:
-------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11213 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/11213/])
HDFS-11377. Balancer hung due to no available mover threads. Contributed
(yqlin: rev 9cbbd1eae893b21212c9bc9e6745c6859317a667)
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
> Balancer hung due to no available mover threads
> -----------------------------------------------
>
> Key: HDFS-11377
> URL: https://issues.apache.org/jira/browse/HDFS-11377
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer & mover
> Affects Versions: 2.7.3
> Reporter: yunjiong zhao
> Assignee: yunjiong zhao
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes,
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007ff6cc014800 nid=0x6b2c waiting on
> condition [0x00007ff6d1bad000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
> at
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
> at
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
> at
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
> at
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
> at
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available,
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]