jtuglu1 commented on code in PR #19541:
URL: https://github.com/apache/druid/pull/19541#discussion_r3351396855


##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/supervisor/SupervisorManager.java:
##########
@@ -186,14 +186,101 @@ public boolean 
createOrUpdateAndStartSupervisor(SupervisorSpec spec)
 
     synchronized (lock) {
       Preconditions.checkState(started, "SupervisorManager not started");
-      final boolean shouldUpdateSpec = shouldUpdateSupervisor(spec);
+      // Persist whenever the spec actually changed (or is new) — independent 
of whether a restart is
+      // required. This stops/recreates the supervisor regardless; persistence 
must not be gated on the
+      // restart decision, otherwise a no-restart change (e.g. taskCount under 
autoscaling) would be
+      // applied to the running supervisor but lost from the metadata store.
+      final boolean specChanged = isSpecChangedAndValidate(spec);
       SupervisorSpec existingSpec = 
possiblyStopAndRemoveSupervisorInternal(spec.getId(), false);
       spec.merge(existingSpec);
-      createAndStartSupervisorInternal(spec, shouldUpdateSpec);
-      return shouldUpdateSpec;
+      createAndStartSupervisorInternal(spec, specChanged);
+      return specChanged;
     }
   }
 
+  /**
+   * Outcome of {@link #createOrUpdateAndStartSupervisor(SupervisorSpec, 
boolean)}.
+   */
+  public enum SpecUpdateOutcome
+  {
+    /** Spec was byte-identical to the running spec and {@code 
skipRestartIfUnmodified} was set: nothing done. */
+    UNCHANGED,
+    /** Spec changed but did not require a restart: persisted to metadata, 
running supervisor left in place. */
+    PERSISTED_WITHOUT_RESTART,
+    /** Supervisor was (re)created and started; spec persisted if it changed. 
*/
+    RESTARTED
+  }
+
+  /**
+   * Decides whether the submitted spec needs a restart and applies it under a 
single lock, so the decision
+   * cannot go stale between deciding and acting (which would let a concurrent 
POST drop a write or persist a
+   * spec that the running supervisor needs to be recreated for). With {@code 
skipRestartIfUnmodified} set, an
+   * unchanged spec is a no-op and a changed spec whose {@link 
SupervisorSpec#requireRestart} is false (e.g. a
+   * taskCount change under autoscaling) is persisted without recreating the 
supervisor; otherwise the
+   * supervisor is stopped and recreated (the only behavior when the flag is 
false).
+   */
+  public SpecUpdateOutcome createOrUpdateAndStartSupervisor(SupervisorSpec 
spec, boolean skipRestartIfUnmodified)

Review Comment:
   `skipRestartIfUnModified=false` -> should just mean we restart irrespective 
of semantic differences in the spec (e.g. current behavior)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to