ayushtkn commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1430800093

   From Admin I mean cluster Administrator services, they can keep a track of 
datanodes and decide on what needs to be done to the datanode.
   If those services can shoot a restart if the datanode is shutdown, they can 
track in which situation the datanode needs to restarted.
   
   Not checking the code, but comments:
   
   - If the datanode is connected to observer namenode, it can serve requests, 
why we need to shutdown,
   - Even if it is connected to standby, a failover happens and it will be in 
good shape, else if you restart a bunch of datanodes, the new namenode will be 
flooded by block reports and just increasing problems.
   - If something gets messed up with Active namenode, you shutdown all, the BR 
are already heavy, you forced all other namenodes to handle them again, making 
failover more difficult. and if it is some faulty datanodes which lost 
connection, you didn't get that alarmed, and all Standby and Observers will 
keep on getting flooded by BRs, so in case Active NN literally dies and tries 
to failover to any of the Namenode which these Datanodes were connected, will 
be fed with unnecessary loads of BlockReports. (BR has an option of initial 
delay as well, it isn't like all bombard at once and you are sorted in 5-10 
mins)
   - If something got messed with the datanode, that is why it isn't able to 
connect to Active. If something is in Memory not persisted to disk, or some JMX 
parameter or N/W parameters which can be used to figure out things gets lost.
   - That is the reason most cluster administrator in not so cool situations, 
show XYZ datanode is unhealthy or not, if in some case they don't it should be 
handled over there.
   - In case of shared datanodes in a federated setup, say it is connected to 
Active for one Namespace and has completely lost touch with another, then? 
Restart to get both working? Don't restart so that at least one stays working? 
Both are correct in there own ways and situation and the datanode shouldn't be 
in a state to decide its fate for such reasons.
   
   We do terminate Namenode is a bunch of conditions for sure, I don't want to 
get deep into those reasons, it is more or less preventive measure to terminate 
Namenode, if something serious has happened. This by architecture of HDFS 
itself isn't look very valid for HDFS.
   
   PS. Making anything configurable doesn't justify having it in. if we are 
letting any user to use this via any config as well, then we should be sure 
enough it is necessary and good thing to do, we can not say ohh you configured 
it, now it is your problem...
   
   I would say it is just pulling those cluster administrator things to 
datanode, like what Cloudera Manager or may be Ambari should do.
   
   Not in favour of this...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to