[
https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HDFS-9239:
--------------------------------
Attachment: HDFS-9239.001.patch
I'm attaching patch v001. This implements the lifeline protocol calls as
described in the design document.
* Lifeline messages are sent in a separate thread from the existing
{{BPServiceActor}} thread to avoid getting stalled on other {{BPServiceActor}}
activity. The scheduling of lifeline messages can be controlled by a few new
configuration properties, which are documented in hdfs-default.xml.
* The NameNode implementation avoids the namesystem lock while processing
lifeline messages.
* {{TestDataNodeLifeline}} is a new test suite. It works by using Mockito to
inject delays in heartbeat processing and then verifying that lifeline messages
still kept the DataNode alive. There are no hard-coded sleep times here.
Instead, {{CountDownLatch}} coordinates the behavior of multiple threads so
that it's deterministic.
> DataNode Lifeline Protocol: an alternative protocol for reporting DataNode
> liveness
> -----------------------------------------------------------------------------------
>
> Key: HDFS-9239
> URL: https://issues.apache.org/jira/browse/HDFS-9239
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: DataNode-Lifeline-Protocol.pdf, HDFS-9239.001.patch
>
>
> This issue proposes introduction of a new feature: the DataNode Lifeline
> Protocol. This is an RPC protocol that is responsible for reporting liveness
> and basic health information about a DataNode to a NameNode. Compared to the
> existing heartbeat messages, it is lightweight and not prone to resource
> contention problems that can harm accurate tracking of DataNode liveness
> currently. The attached design document contains more details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)