[ 
https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189962#comment-15189962
 ] 

Inigo Goiri commented on HDFS-9901:
-----------------------------------

[~hualiu], we should add comments in the new thread creations specifying why we 
do it in threads.
Probably it would also be good to add javadoc to the new Threads explaining 
their purpose:
# DFRefreshThread: Get disk statistics in a thread to avoid Disk IO in 
heartbeat thread.
# DataCheckAndTransfer: Perform the data check in a separate thread to avoid 
Disk IO in the heartbeat thread.

Not sure there's any new unit test to add. Everything should already be checked 
by other unit tests.

> Move disk IO out of the heartbeat thread
> ----------------------------------------
>
>                 Key: HDFS-9901
>                 URL: https://issues.apache.org/jira/browse/HDFS-9901
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Hua Liu
>            Assignee: Hua Liu
>         Attachments: 
> 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 
> 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch
>
>
> During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, 
> which checks the existence and length of a block before spins off a thread to 
> do the actual transferring. In extreme cases, the heartbeat thread hang more 
> than 10 minutes so the namenode marked the datanode as dead and started 
> replicating its blocks, which caused more disk IO on other nodes and can 
> potentially brought them down.
> The patch contains two changes:
> 1. Makes DF asynchronous when monitoring the disk by creating a thread that 
> checks the disk and updates the disk status periodically. When the heartbeat 
> threads generates storage report, it then reads disk usage information from 
> memory so that the heartbeat thread won't get blocked during heavy diskIO. 
> 2. Makes the checks (which required disk accesses) in transferBlock() in 
> DataNode into a separate thread so the heartbeat does not have to wait for 
> this when heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to