[
https://issues.apache.org/jira/browse/HDFS-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181486#comment-15181486
]
Inigo Goiri commented on HDFS-9910:
-----------------------------------
To avoid the heartbeats getting stuck in disk operations, we propose:
# Make {{transferBlock()}} in {{DataNode}} asynchronous so the heartbeat does
not have to wait for this when heartbeating.
# Make {{DF}} asynchronous when monitoring the disk.
> Datanode heartbeats can get blocked by disk in FsDatasetImpl#checkBlock()
> -------------------------------------------------------------------------
>
> Key: HDFS-9910
> URL: https://issues.apache.org/jira/browse/HDFS-9910
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.7.2
> Reporter: Inigo Goiri
> Assignee: Hua Liu
>
> When a data node needs to transfer a block, it validates the block in the
> heartbeat thread invoking the {{checkBlock()}} method of {{FsDatasetImpl}},
> where it checks whether the block exists and gets the block length. If the
> block is valid, it then spins off a thread to do the actual block transfer.
> We found that during heavy disk IO the heartbeat thread hangs on
> {{replicaInfo.getBlockFile().exists()}} for more than 10 minutes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)