dlmarion commented on code in PR #2778:
URL: https://github.com/apache/accumulo/pull/2778#discussion_r934597887


##########
server/base/src/main/java/org/apache/accumulo/server/conf/ServerConfigurationFactory.java:
##########
@@ -140,4 +163,119 @@ public void connectionEvent() {
       // no-op. changes handled by prop store impl
     }
   }
+
+  private class ConfigRefreshRunner {
+    private static final long MIN_JITTER_DELAY = 1;
+    private static final long MAX_JITTER_DELAY = 23;
+    private final ScheduledFuture<?> refreshTaskFuture;
+
+    ConfigRefreshRunner() {
+
+      Runnable refreshTask = this::verifySnapshotVersions;
+
+      ScheduledThreadPoolExecutor executor = ThreadPools.getServerThreadPools()
+          .createScheduledExecutorService(1, "config-refresh", false);
+
+      // staggering the initial delay prevents synchronization of Accumulo 
servers communicating
+      // with ZooKeeper for the sync process. (Value is 25% -> 100% of the 
refresh period.)
+      long randDelay = jitter(REFRESH_PERIOD_MINUTES / 4, 
REFRESH_PERIOD_MINUTES);
+      refreshTaskFuture =
+          executor.scheduleWithFixedDelay(refreshTask, randDelay, 
REFRESH_PERIOD_MINUTES, MINUTES);
+    }
+
+    /**
+     * Check that the stored version in ZooKeeper matches the version held in 
the local snapshot.
+     * When a mismatch is detected, a change event is sent to the prop store 
which will cause a
+     * re-load. If the Zookeeper node has been deleted, the local cache 
entries are removed.
+     * <p>
+     * This method is designed to be called as a scheduled task, so it does 
not propagate exceptions
+     * other than interrupted Exceptions so the scheduled tasks will continue 
to run.
+     */
+    private void verifySnapshotVersions() {
+
+      // short circuit if refresh in progress
+      if (isConfigRefreshRunning.get()) {
+        return;
+      }
+
+      // allow only one thread if missed short circuit check.
+      refreshLock.lock();
+      try {
+        isConfigRefreshRunning.set(true);
+        long refreshStart = System.nanoTime();
+        int keyCount = 0;
+        int keyChangedCount = 0;
+
+        PropStore propStore = context.getPropStore();
+        keyCount++;
+
+        // rely on store to propagate change event if different
+        propStore.validateDataVersion(SystemPropKey.of(context),
+            ((ZooBasedConfiguration) 
getSystemConfiguration()).getDataVersion());
+        // small yield - spread out ZooKeeper calls
+        jitterDelay();
+
+        for (Map.Entry<NamespaceId,NamespaceConfiguration> entry : 
namespaceConfigs.entrySet()) {
+          keyCount++;
+          PropStoreKey<?> propKey = NamespacePropKey.of(context, 
entry.getKey());
+          if (!propStore.validateDataVersion(propKey, 
entry.getValue().getDataVersion())) {
+            keyChangedCount++;
+            namespaceConfigs.remove(entry.getKey());
+          }
+          // small yield - spread out ZooKeeper calls between namespace config 
checks
+          jitterDelay();

Review Comment:
   I understand the start jitter and agree, I just don't think these are 
necessary. Even if two servers started at the same exact time *and* their first 
task started at the same time I believe that the execution times of the two 
servers would deviate over time as:
   
   1. The execution time of the method is going to vary a little, and
   2. The ScheduledExecutorService does not start a task *exactly* on time.
   
   If you want to leave them in, I'm good with that. I just think they are 
unnecessary over the long term. I agree with the initial jitter to prevent a 
thundering herd situation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to