cecemei commented on code in PR #19541:
URL: https://github.com/apache/druid/pull/19541#discussion_r3350993725


##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/supervisor/SupervisorManager.java:
##########
@@ -186,14 +186,101 @@ public boolean 
createOrUpdateAndStartSupervisor(SupervisorSpec spec)
 
     synchronized (lock) {
       Preconditions.checkState(started, "SupervisorManager not started");
-      final boolean shouldUpdateSpec = shouldUpdateSupervisor(spec);
+      // Persist whenever the spec actually changed (or is new) — independent 
of whether a restart is

Review Comment:
   this method is now `createOrUpdateAndStartSupervisor(spec, false)` ? also we 
dont need `shouldUpdateSupervisor` anymore?



##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/supervisor/SupervisorManager.java:
##########
@@ -186,14 +186,101 @@ public boolean 
createOrUpdateAndStartSupervisor(SupervisorSpec spec)
 
     synchronized (lock) {
       Preconditions.checkState(started, "SupervisorManager not started");
-      final boolean shouldUpdateSpec = shouldUpdateSupervisor(spec);
+      // Persist whenever the spec actually changed (or is new) — independent 
of whether a restart is
+      // required. This stops/recreates the supervisor regardless; persistence 
must not be gated on the
+      // restart decision, otherwise a no-restart change (e.g. taskCount under 
autoscaling) would be
+      // applied to the running supervisor but lost from the metadata store.
+      final boolean specChanged = isSpecChangedAndValidate(spec);
       SupervisorSpec existingSpec = 
possiblyStopAndRemoveSupervisorInternal(spec.getId(), false);
       spec.merge(existingSpec);
-      createAndStartSupervisorInternal(spec, shouldUpdateSpec);
-      return shouldUpdateSpec;
+      createAndStartSupervisorInternal(spec, specChanged);
+      return specChanged;
     }
   }
 
+  /**
+   * Outcome of {@link #createOrUpdateAndStartSupervisor(SupervisorSpec, 
boolean)}.
+   */
+  public enum SpecUpdateOutcome
+  {
+    /** Spec was byte-identical to the running spec and {@code 
skipRestartIfUnmodified} was set: nothing done. */
+    UNCHANGED,
+    /** Spec changed but did not require a restart: persisted to metadata, 
running supervisor left in place. */
+    PERSISTED_WITHOUT_RESTART,
+    /** Supervisor was (re)created and started; spec persisted if it changed. 
*/
+    RESTARTED
+  }
+
+  /**
+   * Decides whether the submitted spec needs a restart and applies it under a 
single lock, so the decision
+   * cannot go stale between deciding and acting (which would let a concurrent 
POST drop a write or persist a
+   * spec that the running supervisor needs to be recreated for). With {@code 
skipRestartIfUnmodified} set, an
+   * unchanged spec is a no-op and a changed spec whose {@link 
SupervisorSpec#requireRestart} is false (e.g. a
+   * taskCount change under autoscaling) is persisted without recreating the 
supervisor; otherwise the
+   * supervisor is stopped and recreated (the only behavior when the flag is 
false).
+   */
+  public SpecUpdateOutcome createOrUpdateAndStartSupervisor(SupervisorSpec 
spec, boolean skipRestartIfUnmodified)

Review Comment:
   nit: maybe we need another `forceRestart`? since supervisor might not 
restart when spec is modified but it didnt require so?



##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/supervisor/SupervisorResource.java:
##########
@@ -165,29 +165,39 @@ public Response specPost(
                            .build();
           }
 
-          if (Boolean.TRUE.equals(skipRestartIfUnmodified) && 
!manager.shouldUpdateSupervisor(spec)) {
-            return Response.ok(ImmutableMap.of("id", spec.getId(), 
"restarted", false)).build();
-          }
+          // Decide and apply atomically so the restart decision cannot go 
stale under a concurrent POST.
+          final SupervisorManager.SpecUpdateOutcome outcome =
+              manager.createOrUpdateAndStartSupervisor(spec, 
Boolean.TRUE.equals(skipRestartIfUnmodified));
 
-          manager.createOrUpdateAndStartSupervisor(spec);
-
-          final String auditPayload
-              = StringUtils.format("Update supervisor[%s] for datasource[%s]", 
spec.getId(), spec.getDataSources());
-          auditManager.doAudit(
-              AuditEntry.builder()
-                        .key(spec.getId())
-                        .type("supervisor")
-                        .auditInfo(AuthorizationUtils.buildAuditInfo(req))
-                        
.request(AuthorizationUtils.buildRequestInfo("overlord", req))
-                        .payload(auditPayload)
-                        .build()
-          );
+          // Audit any path that mutated the persisted spec; a no-op UNCHANGED 
submission is not audited.
+          if (outcome != SupervisorManager.SpecUpdateOutcome.UNCHANGED) {
+            auditSupervisorUpdate(spec, req);
+          }
 
-          return Response.ok(ImmutableMap.of("id", spec.getId(), "restarted", 
true)).build();
+          final boolean restarted = outcome == 
SupervisorManager.SpecUpdateOutcome.RESTARTED;
+          return Response.ok(ImmutableMap.of("id", spec.getId(), "restarted", 
restarted)).build();

Review Comment:
   nit: my hunch is that maybe we should have a `modified` boolean field as 
well since sometimes it modifies but doesnt restart?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to