[
https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003219#comment-15003219
]
Ming Ma commented on HDFS-9239:
-------------------------------
Sorry for the jumping in late for the discussion. While we haven't seen any
recent issues caused by DNs incorrectly marked as dead, maybe this feature
could mitigate replication storm issue where incorrectly marked DNs will cause
even more replication?
* It seems the introduction of a new RPC server is to work around the existing
functionality of RPC which only support QoS based on user names. Image if RPC
server can provide differentiated service based on method names, then we can
just add {{sendLifeline}} to existing {{DatanodeProtocol}} and have the same
RPC server can process the method call at the highest priority. Adding
method-based RPC QoS could have help other use cases, for example, if we want
to prioritize existing heartbeat over IBR.
* Regarding the DN contention scenario which blocks it from sending
{{sendLifeline}} to NN, we could skip all info such as storage reports. But if
DN is already such state, maybe not sending {{sendLifeline}} is what we want
anyway.
> DataNode Lifeline Protocol: an alternative protocol for reporting DataNode
> liveness
> -----------------------------------------------------------------------------------
>
> Key: HDFS-9239
> URL: https://issues.apache.org/jira/browse/HDFS-9239
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: DataNode-Lifeline-Protocol.pdf, HDFS-9239.001.patch
>
>
> This issue proposes introduction of a new feature: the DataNode Lifeline
> Protocol. This is an RPC protocol that is responsible for reporting liveness
> and basic health information about a DataNode to a NameNode. Compared to the
> existing heartbeat messages, it is lightweight and not prone to resource
> contention problems that can harm accurate tracking of DataNode liveness
> currently. The attached design document contains more details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)