chihsuan commented on PR #10556: URL: https://github.com/apache/ozone/pull/10556#issuecomment-4760709917
Hi @adoroszlai After looking into it, it turned out there was a second, separate cause of the flakiness: In `testOnMessage`, `setNodeOperationalState(IN_SERVICE)` on a `DEAD` node fires a `DEAD_NODE` event, and SCM's own `DeadNodeHandler` then removes the node from the topology asynchronously, racing with the handlers under test (the `Parent == null` NPE). A single-method run never opened this window. I fixed by draining SCM's event queue with `processAll` after the state change, so the async handler completes first, same idiom as `TestSCMNodeManager`. Re-ran flaky-test-check with `test-name=ALL` (100 runs): all green. https://github.com/chihsuan/ozone/actions/runs/27890936708 Please take another look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
