adoroszlai opened a new pull request #3214:
URL: https://github.com/apache/ozone/pull/3214


   ## What changes were proposed in this pull request?
   
   `testOMRestart` verifies that follower OM catches up to leader OM after it 
was restarted.  It is flaky due to an assertion that after the restart follower 
is lagging behind leader.
   
   Passing case:
   
   ```
   2022-03-18 22:21:54,045 [Listener at 127.0.0.1/60277] INFO  
ratis.OzoneManagerRatisServer (OzoneManagerRatisServer.java:start(554)) - 
Starting OzoneManagerRatisServer omNode-3 at port 60274
   ...
   2022-03-18 22:21:54,341 [Listener at localhost/60277] INFO  
om.TestOzoneManagerHAWithData 
(TestOzoneManagerHAWithData.java:testOMRestart(477)) - ZZZ leader snapshot: 543
   2022-03-18 22:21:54,341 [Listener at localhost/60277] INFO  
om.TestOzoneManagerHAWithData 
(TestOzoneManagerHAWithData.java:testOMRestart(482)) - ZZZ follower last 
applied after restart: 43
   ...
   2022-03-18 22:21:54,477 [grpc-default-executor-2] INFO  
server.RaftServer$Division (ServerState.java:setLeader(285)) - 
omNode-3@group-523986131536: change Leader from null to omNode-1 at term 1 for 
appendEntries, leader elected after 428ms
   ...
   2022-03-18 22:21:54,578 [omNode-3@group-523986131536-StateMachineUpdater] 
INFO  impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(89)) - 
omNode-3@group-523986131536-StateMachineUpdater: snapshotIndex: 
updateIncreasingly 43 -> 542
   ```
   
   Failing case:
   
   ```
   2022-03-18 20:52:24,821 [Listener at 127.0.0.1/58092] INFO  
ratis.OzoneManagerRatisServer (OzoneManagerRatisServer.java:start(554)) - 
Starting OzoneManagerRatisServer omNode-3 at port 58089
   ...
   2022-03-18 20:52:25,232 [grpc-default-executor-4] INFO  
server.RaftServer$Division (ServerState.java:setLeader(285)) - 
omNode-3@group-523986131536: change Leader from null to omNode-1 at term 1 for 
appendEntries, leader elected after 408ms
   ...
   2022-03-18 20:52:25,376 [omNode-3@group-523986131536-StateMachineUpdater] 
INFO  impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(89)) - 
omNode-3@group-523986131536-StateMachineUpdater: snapshotIndex: 
updateIncreasingly 43 -> 544
   ...
   2022-03-18 20:52:25,497 [Listener at localhost/58092] INFO  
om.TestOzoneManagerHAWithData 
(TestOzoneManagerHAWithData.java:testOMRestart(477)) - ZZZ leader snapshot: 543
   2022-03-18 20:52:25,498 [Listener at localhost/58092] INFO  
om.TestOzoneManagerHAWithData 
(TestOzoneManagerHAWithData.java:testOMRestart(482)) - ZZZ follower last 
applied after restart: 544
   ```
   
   In both cases follower caught up after restart, but only after or even 
before the assertion, depending on timing.  (Lines with `ZZZ` are temporary log 
messages before the assertion.)
   
   This PR simply removes the flaky assertion, which is not essential for the 
test.
   
   https://issues.apache.org/jira/browse/HDDS-6469
   
   ## How was this patch tested?
   
   Repeated 100 times:
   https://github.com/adoroszlai/hadoop-ozone/runs/5607104552
   
   Regular CI:
   https://github.com/adoroszlai/hadoop-ozone/runs/5607106443#step:4:8144


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to