xichen01 opened a new pull request, #5886:
URL: https://github.com/apache/ozone/pull/5886

   ## What changes were proposed in this pull request?
   Fix unstable integration tests.
   
   Fix unstable integration tests.
   
   Multiple tests have uncovered a number of things that can cause 
`TestBlockDeletion` to fail
   1. `hdds.datanode.block.delete.queue.limit` defaults to 5, which may cause 
tasks to be discarded if they can't be added to the queue, thus causing the 
test to time out. 
   2. `restartHddsDatanode` in `TestBlockDeletion.testBlockDeletion` can 
sometimes cause the DN to be restarted before sending the 
`DeleteBlockTransactionResult` to the SCM.
   
https://github.com/apache/ozone/blob/4eca52b6b76b0737bfb2d6ed94097969a0737a45/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestBlockDeletion.java#L299-L310
   
   3. `SCMBlockDeletingService#notifyStatusChanged` in 
`SCMBlockDeletingService` may be executed several times, resulting in the 
`serviceStatus` being set from `RUNNING` to `PAUSING`, which leads to 
`SCMBlockDeletingService` does not work.
   ```bash
   2023-12-28 10:42:20,496 
[EventQueue-OpenPipelineForHealthyPipelineSafeModeRule] INFO  
safemode.SCMSafeModeManager (SCMSafeModeManager.java:exitSafeMode(244)) - SCM 
exiting safe mode.  <<--- First
   2023-12-28 10:42:20,496 
[EventQueue-OpenPipelineForHealthyPipelineSafeModeRule] INFO  ha.SCMContext 
(SCMContext.java:updateSafeModeStatus(230)) - Update SafeModeStatus from 
SafeModeStatus{safeModeStatus=true, preCheckPassed=true} to 
SafeModeStatus{safeModeStatus=false, preCheckPassed=true}.
   //...
   2023-12-28 10:42:20,497 
[EventQueue-OpenPipelineForHealthyPipelineSafeModeRule] DEBUG 
ha.SCMServiceManager (SCMServiceManager.java:notifyStatusChanged(51)) - Notify 
service:SCMBlockDeletingService.
   //...
   2023-12-28 10:42:20,499 
[EventQueue-PipelineReportForOneReplicaPipelineSafeModeRule] INFO  
safemode.SCMSafeModeManager 
(SCMSafeModeManager.java:validateSafeModeExitRules(215)) - ScmSafeModeManager, 
all rules are successfully validated
   2023-12-28 10:42:20,499 
[EventQueue-PipelineReportForOneReplicaPipelineSafeModeRule] INFO  
safemode.SCMSafeModeManager (SCMSafeModeManager.java:exitSafeMode(244)) - SCM 
exiting safe mode.   <<--- Second
   2023-12-28 10:42:20,499 
[EventQueue-PipelineReportForOneReplicaPipelineSafeModeRule] INFO  
ha.SCMContext (SCMContext.java:updateSafeModeStatus(230)) - Update 
SafeModeStatus from SafeModeStatus{safeModeStatus=false, preCheckPassed=true} 
to SafeModeStatus{safeModeStatus=false, preCheckPassed=true}.
   //...
   2023-12-28 10:42:20,500 
[EventQueue-PipelineReportForOneReplicaPipelineSafeModeRule] DEBUG 
ha.SCMServiceManager (SCMServiceManager.java:notifyStatusChanged(51)) - Notify 
service:SCMBlockDeletingService.
   //...
   ```
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-9962
   
   ## How was this patch tested?
   
   Existing Test.
   Twice 25 * 15 Tests all successful
   https://github.com/xichen01/ozone/actions/runs/7353980620/attempts/1
   https://github.com/xichen01/ozone/actions/runs/7353980620
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to