chihsuan commented on PR #10556:
URL: https://github.com/apache/ozone/pull/10556#issuecomment-4760709917

   Hi @adoroszlai After looking into it, it turned out there was a second, 
separate cause of the flakiness: 
   
   In `testOnMessage`, `setNodeOperationalState(IN_SERVICE)` on a `DEAD` node 
fires a `DEAD_NODE` event, and SCM's own `DeadNodeHandler` then removes the 
node from the topology asynchronously, racing with the handlers under test (the 
`Parent == null` NPE). A single-method run never opened this window.
   
   I fixed by draining SCM's event queue with `processAll` after the state 
change, so the async handler completes first, same idiom as 
`TestSCMNodeManager`.
   
   Re-ran flaky-test-check with `test-name=ALL` (100 runs): all green.
   
   https://github.com/chihsuan/ozone/actions/runs/27890936708
   
   Please take another look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to