[ 
https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9239:
--------------------------------
    Attachment: HDFS-9239.001.patch

I'm attaching patch v001.  This implements the lifeline protocol calls as 
described in the design document.
* Lifeline messages are sent in a separate thread from the existing 
{{BPServiceActor}} thread to avoid getting stalled on other {{BPServiceActor}} 
activity.  The scheduling of lifeline messages can be controlled by a few new 
configuration properties, which are documented in hdfs-default.xml.
* The NameNode implementation avoids the namesystem lock while processing 
lifeline messages.
* {{TestDataNodeLifeline}} is a new test suite.  It works by using Mockito to 
inject delays in heartbeat processing and then verifying that lifeline messages 
still kept the DataNode alive.  There are no hard-coded sleep times here.  
Instead, {{CountDownLatch}} coordinates the behavior of multiple threads so 
that it's deterministic.


> DataNode Lifeline Protocol: an alternative protocol for reporting DataNode 
> liveness
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-9239
>                 URL: https://issues.apache.org/jira/browse/HDFS-9239
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: DataNode-Lifeline-Protocol.pdf, HDFS-9239.001.patch
>
>
> This issue proposes introduction of a new feature: the DataNode Lifeline 
> Protocol.  This is an RPC protocol that is responsible for reporting liveness 
> and basic health information about a DataNode to a NameNode.  Compared to the 
> existing heartbeat messages, it is lightweight and not prone to resource 
> contention problems that can harm accurate tracking of DataNode liveness 
> currently.  The attached design document contains more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to