[ https://issues.apache.org/jira/browse/HDFS-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554136#comment-15554136 ]
Senthilkumar commented on HDFS-10977: ------------------------------------- [~zhz] , When i was working with balancer last month i too faced this issue , 2016-09-08 06:32:06,574 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN sending #49230 2016-09-08 06:32:06,574 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 0ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN sending #49231 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN got value #49229 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 1ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN got value #49230 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN sending #49232 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 1ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN sending #49233 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN got value #49231 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 0ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN sending #49234 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN got value #49232 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 0ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN sending #49235 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN got value #49233 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 0ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to host/10.103.108. 201:8020 from hadoop/host@DOMAIN got value #49234 Here is the TRACE where it took few hours and no blocks found .. I ended up in Restarting balancer to make it work.. At last i started balancer with -include option ( by pulling LIVE DNs ) and it helped .. Not sure including live nodes is the right solution ?? .. If it is the case i think problem is balancer taking decomm/decommed nodes as well ?? Attaching the discussion which i started sometime back .. > Balancer should query NameNode with a timeout > --------------------------------------------- > > Key: HDFS-10977 > URL: https://issues.apache.org/jira/browse/HDFS-10977 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Reporter: Zhe Zhang > Assignee: Zhe Zhang > Attachments: HDFS-10977-reproduce.patch > > > We found a case where {{Dispatcher}} was stuck at {{getBlockList}} *forever* > (well, several hours when we found it). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org