ayushtkn commented on PR #5396: URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433706087
I think I have lost the flow now 😅 But I think using the getDataNodeStats is a cool thing to explore, it is under a read lock so not costly either, and would be easier to process also may be... "Usually" around metrics, if things can be derived using the exposed ones, we don't coin new ones, generally, there are tools which can do that logics and show you fancy graphs and all also combining metrics together and doing maths on them as well... The Dn case seems a corner case, it won't be very common and need to be careful around not getting pass a split-brain scenario. There are bunch of checks around though, but they are just to verify we don't get a false active claim acknowledged.. But just thinking about this case, it can be figured out by simple logic or scripts, if there are two claiming active, the one from which the last response time is less can be used for those decisions. Something like ``` Variables to Store: activeNnId and LastActiveResponseTime=MAX Fetch Metrics From DN Iterate over all Namenodes. Check if Active NnLastResponseTime < LastActiveResponseTime Store the nnId and last Response Time else Move Forward if LastActiveResponseTime < configurable value conclude dead and do <Whatever> ``` May be some if else or equality might have got inverted, just for idea sake... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org