sodonnel commented on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-982018340


   For me, DECOMMISSION_IN_PROGRESS + DEAD is an error state that means 
decommission has effectively failed. There is a case where it can complete, but 
what does that really mean - if the node is dead, it has not been gracefully 
stopped. If it wasn't for the way decommission is triggered using the hosts 
files, I would suggest switching it back to IN_SERVICE + DEAD, and let it be 
treated like any other dead host.
   
   If you have some monitoring tool tracking the decommission, and it sees 
"DECOMMISSIONED", then it assumes the decommission went fine. 
   
   If if sees DECOMMISSION_IN_PROGRESS + DEAD, then its a flag that the admin 
needs to go look into it, as it should not have happened - perhaps they need to 
bring the node back, or conclude that the cluster is still OK without it (no 
missing blocks) and add it to the exclude list and forget about it.
   
   My feeling is that the priority queue idea adds some more complexity to an 
already hard to follow process / code area and I wonder if it is better to just 
remove the node from the monitor and let it be dealt with manually, which may 
be required a lot of the time anyway?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to