[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

ASF GitHub Bot (Jira) Tue, 14 Feb 2023 22:33:24 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688857#comment-17688857
 ]


ASF GitHub Bot commented on HDFS-16918:
---------------------------------------

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1430828452

   > If the datanode is connected to observer namenode, it can serve requests, 
why we need to shutdown
   
   The observer namenode takes a different condition. I was actually thinking 
about making this include observer namenode too i.e. if datanode has not 
received heartbeat from observer or active namenode in the last e.g. 30s or so, 
then it should shutdown. This is an option, no issues with it.
   
   
   > Even if it is connected to standby, a failover happens and it will be in 
good shape, else if you restart a bunch of datanodes, the new namenode will be 
flooded by block reports and just increasing problems.
   
   This problem would occur only if we select reasonably lower number. The 
recommendation for this config value is high enough to include extra time 
duration for namenode failover.
   
   
   > If something gets messed up with Active namenode, you shutdown all, the BR 
are already heavy, you forced all other namenodes to handle them again, making 
failover more difficult. and if it is some faulty datanodes which lost 
connection, you didn't get that alarmed, and all Standby and Observers will 
keep on getting flooded by BRs, so in case Active NN literally dies and tries 
to failover to any of the Namenode which these Datanodes were connected, will 
be fed with unnecessary loads of BlockReports. (BR has an option of initial 
delay as well, it isn't like all bombard at once and you are sorted in 5-10 
mins)
   
   The moment when active namenode becomes messy, or dies, this is exactly what 
can impact the availability of the hdfs service. So either we have Observer 
namenode take care of read requests in the meantime or the failover needs to 
happen. If neither of that happens, it's the datanode that is not really useful 
by staying the in cluster for longer duration. Let's say namenode gets bad and 
failover does take time, the new active one is anyways going to take time 
processing BRs right?
   
   
   > If something got messed with the datanode, that is why it isn't able to 
connect to Active. If something is in Memory not persisted to disk, or some JMX 
parameter or N/W parameters which can be used to figure out things gets lost.
   
   Do you mean hsync vs hflush kind of thing for in prgress files? Is that not 
already taken care of?
   
   
   > That is the reason most cluster administrator in not so cool situations, 
show XYZ datanode is unhealthy or not, if in some case they don't it should be 
handled over there.
   
   The response would take time from the cluster admin applications. Why not 
get auto healed by datanode? Also it's not that this change is going to 
terminate the datanode, it's going to shut down properly.
   
   
   > In case of shared datanodes in a federated setup, say it is connected to 
Active for one Namespace and has completely lost touch with another, then? 
Restart to get both working? Don't restart so that at least one stays working? 
Both are correct in there own ways and situation and the datanode shouldn't be 
in a state to decide its fate for such reasons.
   
   IMO any namespace that is not connected to active namenode is not up for 
serving requests from active namenode and hence it's not in good state. I got 
your point but the health of a datanode should be determined based on whether 
all BPs are connected to active in the federated setup, is that not the real 
factor determining the health of datanode?
   
   
   > Making anything configurable doesn't justify having it in. if we are 
letting any user to use this via any config as well, then we should be sure 
enough it is necessary and good thing to do, we can not say ohh you configured 
it, now it is your problem...
   
   I am not making claim only based on making this configurable feature. But it 
is reasonable enough to determine best course of action for given situation. 
The only recommendation I have is: user should be able to get the datanode to 
decide whether it should shutdown gracefully when it has not heard anything 
from active or observer namenode for the past x sec (50/60s or so).
   I have tried my best to answer above questions. Please also take a look at 
the Jira/PR description where this idea has been taken from. We have seen 
issues with specific infra and until manually shutting down datanodes, we don't 
see any hope for improving availability, this has happened at multiple times.
   
   Please keep in mind that cluster administrators in cloud native env do not 
have access to JMX metrics due to the security constraints.
   
   Really appreciate all your points and suggestions Ayush, please take a look 
again.




> Optionally shut down datanode if it does not stay connected to active namenode
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-16918
>                 URL: https://issues.apache.org/jira/browse/HDFS-16918
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

Reply via email to