michaeljmarshall commented on code in PR #17401:
URL: https://github.com/apache/pulsar/pull/17401#discussion_r964188907


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/LeaderElectionImpl.java:
##########
@@ -73,14 +76,20 @@ private enum InternalState {
         this.path = path;
         this.serde = new 
JSONMetadataSerdeSimpleType<>(TypeFactory.defaultInstance().constructSimpleType(clazz,
 null));
         this.store = store;
-        this.cache = store.getMetadataCache(clazz);
+        MetadataCacheConfig metadataCacheConfig = MetadataCacheConfig.builder()
+                .expireAfterWriteMillis(-1L)
+                .build();
+        this.cache = store.getMetadataCache(clazz, metadataCacheConfig);
         this.leaderElectionState = LeaderElectionState.NoLeader;
         this.internalState = InternalState.Init;
         this.stateChangesListener = stateChangesListener;
         this.executor = executor;
 
         store.registerListener(this::handlePathNotification);
         store.registerSessionListener(this::handleSessionNotification);
+        updateCachedValueFuture = 
executor.scheduleWithFixedDelay(SafeRunnable.safeRun(this::getLeaderValue),
+                metadataCacheConfig.getRefreshAfterWriteMillis() / 2,
+                metadataCacheConfig.getRefreshAfterWriteMillis(), 
TimeUnit.MILLISECONDS);

Review Comment:
   A call to `getLeaderValueIfPresent` will asynchronously trigger the 
`refreshAfter` logic in the loading cache. Is there a reason it is insufficient 
to rely on those calls?



##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/LeaderElectionImpl.java:
##########
@@ -207,11 +219,6 @@ private synchronized 
CompletableFuture<LeaderElectionState> tryToBecomeLeader()
                         // There was a conflict between 2 participants trying 
to become leaders at same time. Retry
                         // to fetch info on new leader.
 
-                        // We force the invalidation of the cache entry. Since 
we received a BadVersion error, we
-                        // already know that the entry is out of date. If we 
don't invalidate, we'd be retrying the
-                        // leader election many times until we finally receive 
the notification that invalidates the
-                        // cache.
-                        cache.invalidate(path);

Review Comment:
   If we don't invalidate the value here, I think we are at risk of recursively 
retrying to acquire the lock just as the comment indicates because when the 
cached value is `None` and the actual value is another broker, this broker will 
keep getting `BadVersionException` without completing any of the `result` 
objects.
   
   This might be a place where the implementation in #17254 is better because 
an empty cache results in a read to the metadata store. I am wondering if its 
worth using both PRs in order to decrease certain edge cases during leader 
election where the current broker doesn't know the real leader.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to