[ https://issues.apache.org/jira/browse/HDFS-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhe Zhang updated HDFS-10977: ----------------------------- Attachment: HDFS-10977-reproduce.patch Uploading a patch to demonstrate the issue (not the solution). Log from running the test: {code} 2016-10-06 16:27:19,627 [DataXceiver for client DFSClient_NONMAPREDUCE_-1700201955_1 at /127.0.0.1:54189 [Receiving block BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(705)) - Receiving BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001 src: /127.0.0.1:54189 dest: /127.0.0.1:54178 2016-10-06 16:27:19,654 [DataXceiver for client DFSClient_NONMAPREDUCE_-1700201955_1 at /127.0.0.1:54190 [Receiving block BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(705)) - Receiving BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001 src: /127.0.0.1:54190 dest: /127.0.0.1:54183 2016-10-06 16:27:19,660 [DataXceiver for client DFSClient_NONMAPREDUCE_-1700201955_1 at /127.0.0.1:54191 [Receiving block BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(705)) - Receiving BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001 src: /127.0.0.1:54191 dest: /127.0.0.1:54174 2016-10-06 16:27:19,703 [IPC Server handler 2 on 54173] INFO hdfs.StateChange (FSNamesystem.java:fsync(3027)) - BLOCK* fsync: /system/balancer.id for DFSClient_NONMAPREDUCE_-1700201955_1 2016-10-06 16:32:19,760 [IPC Server handler 5 on 54173] WARN BlockStateChange (BlockManager.java:getBlocksWithLocations(1269)) - BLOCK* getBlocks: Asking for blocks from an unrecorded node null:0 {code} So the connector waited 5 mins. > Balancer should query NameNode with a timeout > --------------------------------------------- > > Key: HDFS-10977 > URL: https://issues.apache.org/jira/browse/HDFS-10977 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Reporter: Zhe Zhang > Assignee: Zhe Zhang > Attachments: HDFS-10977-reproduce.patch > > > We found a case where {{Dispatcher}} was stuck at {{getBlockList}} *forever* > (well, several hours when we found it). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org