gortiz commented on PR #18668: URL: https://github.com/apache/pinot/pull/18668#issuecomment-4647621154
## Review notes (non-blocking — flagging for maintainer consideration) I reviewed this PR focusing on the validation/serialization changes. CI is green, but I think the green is partly misleading (details below). Two things I'd like the maintainers to weigh in on before merge. ### 1. Update path for existing tables `TableConfigUtils.validate` unconditionally rejects any `FieldConfig` with a field-level `encodingType`, and that method runs on table **update** (`PinotTableRestletResource.java:364`) and the validate endpoint (`:868`), not just on create. Any deployment with a table whose **stored** config carries `fieldConfigList[].encodingType` — historically the documented and common form — would be unable to apply *any* subsequent config update until an operator manually rewrites it into `indexes.forward.encodingType`. No auto-migration runs on this path (`createTableConfigFromOldFormat` has no production callers). Suggestion: auto-rewrite the deprecated field into the forward block during validation/normalization and emit a deprecation warning, rather than hard-rejecting — or gate the rejection behind a flag / staged rollout with an explicit migration step. Per the repo's mixed-version compat mandate, removing an accepted config key on the write path probably wants a migration story (migration notes / rollback / flag), none of which is in the PR today. ### 2. CI green doesn't cover the break The compatibility-verifier suite creates tables on the *old* binary and never updates a legacy config on the *new* controller, so the update-path break above is unverified. Relatedly, those suite configs themselves still use the deprecated top-level form — e.g. `compatibility-verifier/sample-test-suite/config/feature-test-1.json:4` still has `"encodingType": "RAW"`. The PR migrated the pinot-tools samples and README but missed these. Worth either migrating them or documenting why the compat path is intentionally exempt. A regression test that updates a stored legacy config on the new controller would close the gap. ### Smaller notes - **Serialization shape change:** `FieldConfig` no longer defaults `_encodingType` to DICTIONARY and omits the key when unset. Self-roundtrip is tested; mixed-version (old broker/server reading a new-controller config that omits the key) is not. - **Silent precedence change:** `ForwardIndexType.createDeserializer` dropped the conflict check, so a config setting both field-level and forward-block encoding to *conflicting* values now silently uses the forward value instead of erroring. ### Positives - The `List`→`LinkedHashSet` change in `DictionaryIndexType.handleIndexSpecificCleanup` fixes a latent double-update bug. - The rejection message is excellent — it names the column and the exact replacement key. - README, UI, and pinot-tools sample configs were migrated consistently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
