Viraj Jasani created HDFS-16902:
-----------------------------------
Summary: Add Namenode status to BPServiceActor metrics and improve
logging in offerservice
Key: HDFS-16902
URL: https://issues.apache.org/jira/browse/HDFS-16902
Project: Hadoop HDFS
Issue Type: Task
Reporter: Viraj Jasani
Assignee: Viraj Jasani
Recently came across an k8s environment where randomly some datanode pods are
not able to stay connected to all namenode pods (e.g. last heartbeat time stays
higher than 2 hr sometimes). When new namenode becomes active, any datanode
that is not heartbeating to it would not be able to send any further block
reports, leading to missing replicas sometimes, which would be resolved only
with datanode pod restart.
While the issue seems env specific, BPServiceActor's offer service could use
some logging improvements. It is also good to get namenode status exposed with
BPServiceActorInfo to identify any lags from datanode side in recognizing
updated Active namenode status with heartbeats.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]