MarvinCai commented on a change in pull request #9443:
URL: https://github.com/apache/pulsar/pull/9443#discussion_r569627190



##########
File path: 
pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/LoadBalancerTest.java
##########
@@ -176,23 +176,30 @@ void shutdown() throws Exception {
         bkEnsemble.stop();
     }
 
-    private LeaderBroker loopUntilLeaderChanges(LeaderElectionService les, 
LeaderBroker oldLeader,
-            LeaderBroker newLeader) throws InterruptedException {
+    private void loopUntilLeaderChangesForAllBroker(List<PulsarService> 
activePulsars, LeaderBroker oldLeader)
+            throws InterruptedException {
         int loopCount = 0;
+        boolean settled;
 
         while (loopCount < MAX_RETRIES) {
             Thread.sleep(1000);
-            // Check if the new leader is elected. If yes, break without 
incrementing the loopCount
-            newLeader = les.getCurrentLeader().get();
-            if (newLeader.equals(oldLeader) == false) {
+            settled = true;
+            // Check if the all active pulsar see a new leader
+            for (PulsarService pulsar : activePulsars) {
+                Optional<LeaderBroker> leader = 
pulsar.getLeaderElectionService().readCurrentLeader().join();
+                if (leader.isPresent() && leader.get().equals(oldLeader)) {

Review comment:
       @315157973 I think the old logic is it pick the last follower it seen 
and check if it sees the new leader, which is in this chunk of code 
([ref1](https://github.com/apache/pulsar/blob/fd7da5210b59fe9fd7b2619534e8122ba7b2701a/pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/LoadBalancerTest.java#L730),
 
[ref2](https://github.com/apache/pulsar/blob/fd7da5210b59fe9fd7b2619534e8122ba7b2701a/pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/LoadBalancerTest.java#L186-L188)).
   And then it just do the check which can't guarantee all follower already saw 
a new leader since for some follower which try to become leader, it'll first 
see empty path "/loadbalance/leader", read it in cache as [Optional.Empty] then 
try to create the znode, but after it fail(other node already create that znode 
and become leader) and before it's cache get updated by zk watch, there're 
might be some delay so old test can still see that [Optinal.Empty]. So loop 
through all followers and making sure all of them already a new leader, then 
check all of them see the same leader can solve the problem.
   Does it make sense?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to