neils-dev opened a new pull request #2474:
URL: https://github.com/apache/ozone/pull/2474


   ## What changes were proposed in this pull request?
   `TestOMRatisSnapshots#testInstallSnapshot` fails intermittently.  PR 
includes small patch to integration test to fix intermittent problem.  Problem 
occurs when transactions are added to leader OM that are then applied to a 
follower OM that is placed into the active state from inactive.  Test puts the 
follower OM into active state from inactive and expects the follower to 
initially have less transactions than the leader index, however the updates are 
done with a nonblocking background thread.  Checking the OM index in this 
manner is prone to error due to the nonblocking install, at times the number of 
transactions, the index of the follower OM is >= the leader.   
   
   To fix the intermittent issue, a simple wait and retry construct is used to 
check the follower index periodically until it updates, installs the snapshot.  
A 3 min timeout is used to propagate a fatal error.  The wait and retry block 
checks the state of the snapshot install and ensures the follower index is <=   
leader `lastAppliedTermIndex` -1.
   
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-4668
   
   ## How was this patch tested?
   Tested though` hadoop-ozone/dev-support/checks/integration.sh` script in CI 
environment with environment variables `$ITERATIONS=60, $MAVEN_OPTS: 
-Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false 
-Dmaven.wagon.http.retryHandler.class=standard 
-Dmaven.wagon.http.retryHandler.count=3`
   
   `hadoop-ozone/dev-support/checks/integration.sh  
-Dtest=TestOMRatisSnapshots#testInstallSnapshot`
   
   Run cat target/integration/summary.txt
   Iteration 1 exit code: 0
   Iteration 2 exit code: 0
   Iteration 3 exit code: 0
   Iteration 4 exit code: 0
   Iteration 5 exit code: 0
   Iteration 6 exit code: 0
   Iteration 7 exit code: 0
   Iteration 8 exit code: 0
   Iteration 9 exit code: 0
   ...
   Iteration 60 exit code: 0
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to