[
https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HDFS-9239:
--------------------------------
Attachment: HDFS-9239.002.patch
I'd like to proceed with this feature, as it has been mentioned as potentially
relevant in comments on other JIRAs. I'm attaching patch v002 with just a few
small changes:
# Rebase on current trunk.
# Address comments from Anu.
# Fix a few Checkstyle warnings. I think the remaining Checkstyle warnings
flagged in the last pre-commit run are not worth addressing, but I'll review
the next pre-commit run for new warnings.
There had been a suggestion of changing the existing heartbeat handling to use
tryLock. I explored this a bit, but I'm reluctant to alter mainline heartbeat
processing at all. Overall, I think this feature is less intrusive as
currently implemented, despite the fact that another RPC server adds some
operational complexity. Perhaps a tryLock-based implementation of heartbeat
handling could be done in a separate JIRA, again gated by a configuration flag,
to enable further experimentation in large clusters.
> DataNode Lifeline Protocol: an alternative protocol for reporting DataNode
> liveness
> -----------------------------------------------------------------------------------
>
> Key: HDFS-9239
> URL: https://issues.apache.org/jira/browse/HDFS-9239
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: DataNode-Lifeline-Protocol.pdf, HDFS-9239.001.patch,
> HDFS-9239.002.patch
>
>
> This issue proposes introduction of a new feature: the DataNode Lifeline
> Protocol. This is an RPC protocol that is responsible for reporting liveness
> and basic health information about a DataNode to a NameNode. Compared to the
> existing heartbeat messages, it is lightweight and not prone to resource
> contention problems that can harm accurate tracking of DataNode liveness
> currently. The attached design document contains more details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)