errose28 commented on pull request #2549:
URL: https://github.com/apache/ozone/pull/2549#issuecomment-902128424


   Thanks to everyone who has helped investigate this. Here is my current 
understanding of how this test is supposed to work:
   
   1. Pipeline created
   2. Log failure triggered on datanode
   3. Datanode is expected to notify SCM of pipeline close without waiting for 
the heartbeat interval.
   4. Test checks SCM event queue to see if the mock pipeline action handler it 
registered was called within the timeout.
   
   So although the mini ozone conf being set with the heartbeat interval in the 
wrong order is bad, it should have no bearing on this test. I verified this by 
setting the heartbeat interval to 10 seconds. The test behaves the same.
   
   By turning on trace logging for the EventQueue, we can see that even in the 
failure cases both the original PipelineActionHandler and the mocked 
PipelineActionHandler are triggered on SCM pretty quickly after the DN log 
fails. This gives me reasonable confidence that the code is working as expected 
and the error was in the test. So since the test seems to pass repeatedly with 
the new timeout, I am ok with the fix.
   
   However, I do not fully understand why we had to increase the timeout to 1 
second. SCM event queue fires asynchronuously and I verified the heartbeat 
interval is not blocking the datanode. I guess it is possible that DN execution 
time + loopback RPC latency + SCM execution time takes up to 1 second 
sometimes, although this seems kind of high since it previously finished under 
100ms ~29/30 times, and under 500ms ~139/140 times.
   
   TLDR I am reasonably confident the code is working correctly and this test 
fix is ok, but why the test fix was needed in the first place is still a bit of 
a mystery to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to