j1wonpark opened a new issue, #4253: URL: https://github.com/apache/amoro/issues/4253
## Search before asking - [x] I have searched in the [issues](https://github.com/apache/amoro/issues?q=is%3Aissue) and found no similar issues. ## Description This is the first implementation phase of **AIP-5: Dynamic Resource Allocation for Optimizer**. It lays the configuration foundation that the later scaling phases build on, with no runtime scaling behavior yet. Dynamic allocation is opt-in and disabled by default (`dynamic-allocation.enabled = false`), so existing groups are unaffected and this phase is safe to merge independently. The full `dynamic-allocation.*` property set is declared here so the configuration surface and its validation are settled in one place, rather than drip-fed per phase. Scope: 1. **Configuration properties.** Declare the `dynamic-allocation.*` properties as constants in `OptimizerProperties`, each annotated with `@since`: - `dynamic-allocation.enabled` (default `false`) - `dynamic-allocation.min-parallelism` (default `0`) - `dynamic-allocation.max-parallelism` (required when enabled; hard-capped at `1024`) - `dynamic-allocation.scheduler-backlog-timeout` (default `1min`) - `dynamic-allocation.sustained-backlog-timeout` (default `30s`) - `dynamic-allocation.executor-idle-timeout` (default `5min`, minimum `30s`) - `dynamic-allocation.scale-down-cooldown` (default `1min`) - `dynamic-allocation.drain-timeout` (default `15min`) 2. **`DynamicAllocationConfig` parsing + validation.** A new `DynamicAllocationConfig.parse(ResourceGroup)` / `validate()` that enforces: - `max-parallelism` is mandatory when enabled; - `min-parallelism ≤ max-parallelism ≤ 1024`; - `executor-idle-timeout ≥ 30s`; - all durations parse to positive values; - DRA is not enabled on an externally-registered optimizer (conservative check in this phase: reject the `external` container). 3. **Failure handling.** - **REST create/update — fail fast.** `OptimizerGroupController.createResourceGroup` / `updateResourceGroup` reject an invalid DRA config with **HTTP 400**. (The update path previously had no property-value validation at all.) - **Startup load — fail safe, never silent.** A persisted group with an invalid DRA config no longer crashes AMS; it logs a warning and falls back to DRA-disabled behavior. (The `optimizer_group_config_invalid` gauge is wired in the observability phase.) 4. **`min-parallelism` deprecation.** The flat `min-parallelism` property is deprecated in favor of `dynamic-allocation.min-parallelism` but still honored. Resolution order: namespaced → legacy → `0`. A one-off deprecation warning is logged at config-entry points (startup load, REST), not on the keeper hot path. 5. **Auto-reset compatibility.** The keeper already auto-resets a group's `min-parallelism` after `optimizer-group.max-keeping-attempts` consecutive failed optimizer creations. Two adjustments keep this consistent with the new property model: - **Disabled when DRA is effectively enabled.** Once a group opts into dynamic allocation (and its config is valid), DRA owns the group's scale decisions, so the keeper no longer erodes its `min-parallelism` floor — it keeps retrying instead. "Effectively enabled" means opted-in *and* valid; an invalid config is treated as disabled, matching the startup fail-safe, so such groups still auto-reset as before. - **Writes the key the group actually uses.** When auto-reset does apply, it now writes whichever `min-parallelism` key the group reads from (namespaced if present, else the legacy flat key). Previously it always wrote the flat key; for a group using the namespaced property that write was shadowed by resolution order, turning auto-reset into an endless no-op loop (repeated warning + DB update with no effect). Out of scope for this phase (later phases): demand-driven scale-up, idle tracking, scale-down + graceful drain, and the new metrics. The `config_invalid` gauge and `scale_*_total` counters land in the observability phase. ## Parent issue AIP-5: https://github.com/apache/amoro/issues/4191 ## Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ## Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
