hongkunxu opened a new pull request, #18564: URL: https://github.com/apache/pinot/pull/18564
## Motivation Today, "is this a materialized view (MV) table?" requires scattered heuristics: - scanning `tableConfig.task` for a `MaterializedViewTask` entry, - fetching the MV definition znode from ZK, - or parsing `definedSQL` with `CalciteSqlParser`. This has three concrete problems: 1. **No single source of truth.** Different call sites use different checks; they can disagree and tend to drift apart over time. 2. **Identity is observable only after the scheduler has run.** The MV definition znode is created lazily by the scheduler on the first task run, not synchronously at table create — so any caller that asks "is MV?" between create and the first scheduler tick gets the wrong answer. 3. **Task config doubles as identity.** `MaterializedViewTask` (and `definedSQL`) describe *execution*, not *identity*. Using them for identity means the create/update path cannot reject malformed combinations early, and there is no place to anchor backward-compatibility invariants like "`definedSQL` is immutable". ## What this PR does Introduce `isMaterializedView` on `TableConfig` as the **single source of truth** for MV identity, modeled on `isDimTable`. Every code path that needs to know "is this an MV table?" now reads exactly one boolean. ## Resulting layering | Layer | Responsibility | Owns | |-------|----------------|------| | **`pinot-spi` — `TableConfig.isMaterializedView`** | MV identity (declarative). | `IS_MATERIALIZED_VIEW_KEY` field, getter, `TableConfigBuilder.setIsMaterializedView`, ZN serde, DDL property mapping. | | **`pinot-segment-local` — `TableConfigUtils`** | Structural invariants. | `validateMaterializedViewInvariants` (flag ⇒ `OFFLINE` + `MaterializedViewTask` + non-empty `definedSQL`; task without flag rejected; task without `definedSQL` rejected; cannot also be `isDimTable`). `validateBackwardCompatibility` — flag is immutable; `definedSQL` is immutable after creation; `MaterializedViewTask` cannot be removed from an MV table. | | **`pinot-materialized-view` — `MaterializedViewAnalyzer` / `Scheduler`** | Semantic validation + task generation. | `analyze()` requires the flag; `MaterializedViewTaskScheduler.generateTasks` skips non-MV tables. | | **`pinot-controller` — `PinotHelixResourceManager`** | Wire MV into cluster lifecycle. | Both `notifyMaterializedViewConsistencyManagerForTableCreate` and `…ForTableDrop` gated on the flag; the drop path no longer falls back to task-config scan or `CalciteSqlParser` to infer identity. | | **ZK `Definition` / `Runtime` znodes** | Runtime view of the MV instance. | Unchanged — still drive `watermarkMs`, partition VALID/STALE, etc. Identity and runtime are now cleanly separated. | Each layer depends only on the one above, with no circular reads. `MaterializedViewTask` keeps its original role: carrier of execution parameters (`definedSQL`, `bucketTimePeriod`, …), not identity. ## Benefits - **One bit answers identity** — no more scanning task configs, fetching znodes, or parsing SQL just to ask "is this an MV?". Identity is now correct from the moment the table config is written, not from the first scheduler tick. - **Fail-fast on create/update** — malformed combinations (e.g. `MaterializedViewTask` without the flag, flag without `OFFLINE`, flag together with `isDimTable`) are rejected at the validation gate, instead of producing a half-configured MV that the scheduler / consistency manager later trips over. - **Immutability has a place to live** — `definedSQL` immutability and "can't drop `MaterializedViewTask` from an MV table" are now enforced by `validateBackwardCompatibility`, aligned with `pinot-materialized-view/DESIGN.md`. - **Clean module boundaries** — `pinot-spi` owns the flag, `pinot-segment-local` owns the structural rules, `pinot-materialized-view` owns the semantics, `pinot-controller` only consumes the flag. `pinot-segment-local` does **not** depend on `pinot-materialized-view`. - **Easier to extend** — future MV-aware code (broker rewrite, REST listings, UI filters, etc.) can rely on a single declarative bit instead of replicating heuristics. ## Tests - `MaterializedViewAnalyzerTest`: every MV `TableConfigBuilder` now calls `.setIsMaterializedView(true)`; new test asserts `analyze()` fails when the flag is missing. 50/50 pass. - `TableConfigUtilsTest`: new tests cover the four `validateMaterializedViewInvariants` branches and the backward-compat rules (`definedSQL` immutable, flag cannot change). Existing positional `TableConfig(...)` call sites updated for the new parameter. - Checkstyle clean on `pinot-spi`, `pinot-common`, `pinot-segment-local`, `pinot-materialized-view`, `pinot-sql-ddl`, `pinot-controller`. - `pinot-connectors/pinot-spark-3-connector` recompiled against the new constructor signature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
