[ 
https://issues.apache.org/jira/browse/HDFS-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-10977:
-----------------------------
    Attachment: HDFS-10977-reproduce.patch

Uploading a patch to demonstrate the issue (not the solution). Log from running 
the test:
{code}
2016-10-06 16:27:19,627 [DataXceiver for client 
DFSClient_NONMAPREDUCE_-1700201955_1 at /127.0.0.1:54189 [Receiving block 
BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001]] INFO  
datanode.DataNode (DataXceiver.java:writeBlock(705)) - Receiving 
BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001 src: 
/127.0.0.1:54189 dest: /127.0.0.1:54178
2016-10-06 16:27:19,654 [DataXceiver for client 
DFSClient_NONMAPREDUCE_-1700201955_1 at /127.0.0.1:54190 [Receiving block 
BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001]] INFO  
datanode.DataNode (DataXceiver.java:writeBlock(705)) - Receiving 
BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001 src: 
/127.0.0.1:54190 dest: /127.0.0.1:54183
2016-10-06 16:27:19,660 [DataXceiver for client 
DFSClient_NONMAPREDUCE_-1700201955_1 at /127.0.0.1:54191 [Receiving block 
BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001]] INFO  
datanode.DataNode (DataXceiver.java:writeBlock(705)) - Receiving 
BP-1747605867-172.21.144.175-1475796435325:blk_1073741825_1001 src: 
/127.0.0.1:54191 dest: /127.0.0.1:54174
2016-10-06 16:27:19,703 [IPC Server handler 2 on 54173] INFO  hdfs.StateChange 
(FSNamesystem.java:fsync(3027)) - BLOCK* fsync: /system/balancer.id for 
DFSClient_NONMAPREDUCE_-1700201955_1
2016-10-06 16:32:19,760 [IPC Server handler 5 on 54173] WARN  BlockStateChange 
(BlockManager.java:getBlocksWithLocations(1269)) - BLOCK* getBlocks: Asking for 
blocks from an unrecorded node null:0
{code}

So the connector waited 5 mins.

> Balancer should query NameNode with a timeout
> ---------------------------------------------
>
>                 Key: HDFS-10977
>                 URL: https://issues.apache.org/jira/browse/HDFS-10977
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-10977-reproduce.patch
>
>
> We found a case where {{Dispatcher}} was stuck at {{getBlockList}} *forever* 
> (well, several hours when we found it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to