[
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755754#comment-15755754
]
Wei-Chiu Chuang commented on HDFS-9901:
---------------------------------------
FsDatasetImpl#checkBlock does not perform any disk I/O at all. It looks up an
in-memory structure. I don't understand why there's I/O involved. Please
correct me if I am wrong.
Also, FsDatasetImpl#checkBlock is called without lock, which is unusual. (this
is existing code)
I think making transferBlock an asynchronous thread is fine though. But I still
don't know why that is needed.
> Move disk IO out of the heartbeat thread
> ----------------------------------------
>
> Key: HDFS-9901
> URL: https://issues.apache.org/jira/browse/HDFS-9901
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Hua Liu
> Assignee: Hua Liu
> Attachments:
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch,
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch,
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch,
> 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch,
> 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method,
> which checks the existence and length of a block before spins off a thread to
> do the actual transferring. In extreme cases, the heartbeat thread hang more
> than 10 minutes so the namenode marked the datanode as dead and started
> replicating its blocks, which caused more disk IO on other nodes and can
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that
> checks the disk and updates the disk status periodically. When the heartbeat
> threads generates storage report, it then reads disk usage information from
> memory so that the heartbeat thread won't get blocked during heavy diskIO.
> 2. Makes the checks (which required disk accesses) in transferBlock() in
> DataNode into a separate thread so the heartbeat does not have to wait for
> this when heartbeating.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]