xBis7 commented on PR #4362:
URL: https://github.com/apache/ozone/pull/4362#issuecomment-1491920080

   @adoroszlai The latest changes fix the timeout issue. I've launched multiple 
workflows and it's not occurring anymore. But this revealed another underlying 
issue that might not even have to do with the test. During leader change the 
metrics don't get updated. 
   
   `OMHAMetrics` rely upon calling `OzoneManager.updatePeerList()`, at the end 
of this method we unregister the metrics and then register them again. It was 
my understanding that after every time an OM gets started, stopped or 
restarted, there is a conf change and `OMStateMachine` calls that method. That 
doesn't seem to be the case.
   
   Latest four workflows, where you can see that there is no timeout failure. 
All failures are due to the metrics not getting updated.
   
   https://github.com/xBis7/ozone/actions/runs/4566947556
   
   https://github.com/xBis7/ozone/actions/runs/4567066352
   
   https://github.com/xBis7/ozone/actions/runs/4574892711
   
   https://github.com/xBis7/ozone/actions/runs/4574961334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to