[
https://issues.apache.org/jira/browse/HDFS-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683240#comment-17683240
]
ASF GitHub Bot commented on HDFS-16902:
---------------------------------------
tomscut commented on code in PR #5334:
URL: https://github.com/apache/hadoop/pull/5334#discussion_r1094080750
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode/datanode.html:
##########
@@ -81,6 +81,7 @@
<thead>
<tr>
<th>Namenode Address</th>
+ <th>Namenode HA state</th>
Review Comment:
nit: `state` -> `State`.
> Add Namenode status to BPServiceActor metrics and improve logging in
> offerservice
> ---------------------------------------------------------------------------------
>
> Key: HDFS-16902
> URL: https://issues.apache.org/jira/browse/HDFS-16902
> Project: Hadoop HDFS
> Issue Type: Task
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
>
> Recently came across an k8s environment where randomly some datanode pods are
> not able to stay connected to all namenode pods (e.g. last heartbeat time
> stays higher than 2 hr sometimes). When any standby namenode becomes active,
> any datanode that is not heartbeating to it for quite sometime would not be
> able to send any further block reports, leading to missing replicas
> immediately after namenode failover, which could only be resolved with
> datanode pod restart.
> While the issue seems env specific, BPServiceActor's offer service could use
> some logging improvements. It is also good to get namenode status exposed
> with BPServiceActorInfo to identify any lags from datanode side in
> recognizing updated Active namenode status with heartbeats.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]