[ 
https://issues.apache.org/jira/browse/HDFS-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181486#comment-15181486
 ] 

Inigo Goiri commented on HDFS-9910:
-----------------------------------

To avoid the heartbeats getting stuck in disk operations, we propose:
# Make {{transferBlock()}} in {{DataNode}} asynchronous so the heartbeat does 
not have to wait for this when heartbeating.
# Make {{DF}} asynchronous when monitoring the disk.

> Datanode heartbeats can get blocked by disk in FsDatasetImpl#checkBlock()
> -------------------------------------------------------------------------
>
>                 Key: HDFS-9910
>                 URL: https://issues.apache.org/jira/browse/HDFS-9910
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.7.2
>            Reporter: Inigo Goiri
>            Assignee: Hua Liu
>
> When a data node needs to transfer a block, it validates the block in the 
> heartbeat thread invoking the {{checkBlock()}} method of {{FsDatasetImpl}}, 
> where it checks whether the block exists and gets the block length. If the 
> block is valid, it then spins off a thread to do the actual block transfer. 
> We found that during heavy disk IO the heartbeat thread hangs on 
> {{replicaInfo.getBlockFile().exists()}} for more than 10 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to