[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Konstantin Shvachko (JIRA) Tue, 03 Mar 2009 18:29:22 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678553#action_12678553
 ]


Konstantin Shvachko commented on HADOOP-4584:
---------------------------------------------

Separating a HB thread from the main offerService thread has the following 
disadvantages:
# This does not remove contention on processing blocks reports.
That is, the data-node is still blocked preparing block report and cannot do 
anything useful like send blockReceived or process commands from the name-node. 
The only good thing is that it does not die.
# We loose automatic data-node activity throttling with this. 
Meaning that while the data-node is busy it still sends heartbeats and 
name-node replies with commands, which are piled up in the queue because the DN 
cannot process them.
This can probably be solved with a smart command queue maintenance or by 
adjusting of heartbeat frequency with respect to the length of the queue, but 
will require more work and very thorough tuning.
# Related to previous. Administrators will no longer be able to judge that a 
data-node is in trouble by just looking at its heartbeat interval.

So I would argue to keep HB processing in the main offerService loop, but 
rather separate the block report processing into a separate thread.
In general we should keep all heavy-weight operation like delete-blocks away 
from the offer service loop. They can be done in separate threads.
Does that make me a supporter of "Option 3"?

> Slow generation of blockReport at DataNode causes delay of sending heartbeat 
> to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 
> 4584.patch, 4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens 
> of minutes to generate a block report. It causes the datanode not able to 
> send heartbeat to NameNode every 3 seconds. In the worst case, it makes 
> NameNode to detect a lost heartbeat and wrongly decide that the datanode is 
> dead.
> It would be nice to have two threads instead. One thread is for scanning data 
> directories and generating block report, and executes the requests sent by 
> NameNode; Another thread is for sending heartbeats, block reports, and 
> picking up the requests from NameNode. By having these two threads, the 
> sending of heartbeats will not get delayed by any slow block report or slow 
> execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Reply via email to