krishan1390 opened a new pull request, #18518: URL: https://github.com/apache/pinot/pull/18518
## Why this change is needed `PinotHelixResourceManager.deleteSegments` historically had **zero lineage awareness**. The per-table `SegmentLineage` znode tracks segment replacements (`segmentsFrom -> segmentsTo`) across three states (`IN_PROGRESS`, `COMPLETED`, `REVERTED`), but every external delete path — REST DELETEs, `RetentionManager`, minion task generators, push-failure cleanup — could remove a segment regardless of whether it participated in a live lineage entry. Symmetrically, `endReplaceSegments` only checked that `segmentsTo` are ONLINE; it never re-validated that `segmentsFrom` still existed. This produced four concrete failure modes: 1. `startReplaceSegments(A1 -> B)` races with a concurrent `deleteSegment(A1)` and commits an `IN_PROGRESS` entry referencing a now-missing `A1`. `revertReplaceSegments` then fails its own existence check without `forceRevert=true`. 2. Lineage `IN_PROGRESS` with `A1` in `segmentsFrom` — `deleteSegment(A1)` silently breaks the revert path. 3. Lineage `COMPLETED` with `A1` in `segmentsFrom` — `A1` should only be deleted by the lineage cleanup path (`DefaultLineageManager.updateLineageForRetention`). An external delete bypasses retention/revert semantics. 4. Lineage `IN_PROGRESS` with `B` in `segmentsTo` — `deleteSegment(B)` makes `endReplaceSegments` hang and then fail on the 10-minute ONLINE poll. ## The blocked-set rule A segment `s` is "lineage-locked" iff some lineage entry has: - state `IN_PROGRESS` and `s ∈ (segmentsFrom ∪ segmentsTo)`, OR - state `COMPLETED` and `s ∈ segmentsFrom`. `REVERTED` entries lock nothing. This is the same set that `SegmentLineageUtils.filterSegmentsBasedOnLineageInPlace` already removes from query routing — i.e. anything currently hidden from queries because of lineage is also blocked from external deletion. The new `SegmentLineageUtils.getDeleteBlockedSegments` helper lives next to that filter so the symmetry is obvious. ## The correctness mechanism The change is intentionally asymmetric and avoids any new ZK lock or CAS on the delete hot path. The lineage znode is the globally-visible coordination point, and correctness is enforced by checks at **both** ends of the lifecycle: | Interleaving | Outcome | |---|---| | Lineage write commits first; delete reads lineage after | `deleteSegments` sees the entry, fails with `SegmentsInLineageException` (REST 409). IdealState is left untouched. | | Delete commits IdealState removal first; lineage write commits after | Both commit. `endReplaceSegments`'s new re-validation of `segmentsFrom` against IdealState fails with an actionable error. Stranded `IN_PROGRESS` entry; recovered via `revertReplaceSegments(forceRevert=true)`. | | Truly concurrent reads | Same as above — `endReplaceSegments` catches it. | The remaining race window is small **and** recoverable: it produces a stranded `IN_PROGRESS` entry, never silent corruption. This applies uniformly to offline and realtime tables because both flow through the same `PinotHelixResourceManager.deleteSegments` primitive. ## Changes ### `PinotHelixResourceManager` - New private `deleteSegmentsInternal(table, segments, retentionPeriod, bypassLineageCheck)`; public `deleteSegments(...)` overloads delegate with `bypassLineageCheck=false`. - New public `deleteSegmentsForLineageCleanup(...)` overloads delegate with `bypassLineageCheck=true` — reserved for internal callers that have already coordinated with the lineage lifecycle (`startReplaceSegments` proactive cleanup, `revertReplaceSegments` post-revert cleanup, `RetentionManager`'s lineage retention pass). - Lineage pre-check runs **before** `HelixHelper.removeSegmentsFromIdealState`. A single ZK read; fast-path on missing znode; fail-closed on ZK error. - The whole batch is rejected on any conflict — no partial commit semantics on IdealState. - `endReplaceSegments` now re-validates that every segment in `lineageEntry.getSegmentsFrom()` is still present in IdealState before flipping the entry to `COMPLETED`. The authoritative source is IdealState (not segment ZK metadata, which lingers in `SegmentDeletionManager`'s retention window after a delete). ### `SegmentsInLineageException` (new, package-local typed exception) Carries `tableNameWithType` and the set of `blockingSegments`. Mapped by `PinotSegmentRestletResource` to HTTP **409 Conflict**. ### `RetentionManager` - Routes the legitimate lineage cleanup call through the bypass overload. - New `removeLineageLockedSegments` helper filters lineage-locked segments out of the offline / realtime / hybrid retention batches before calling `deleteSegments`, so the public check never has to reject a retention batch. Lineage-locked segments are left in place until the lineage retention pass decides to remove them. ### `ControllerConf` - New `controller.lineage.exclusive.delete.enabled` kill switch (default `true`) to fall back to legacy behavior during a rolling upgrade if needed. ### `ControllerMeter` - New `LINEAGE_BLOCKED_DELETE_COUNT` table-scoped meter, incremented when a delete request is rejected. ## Recovery contract For the narrow race that produces a stranded `IN_PROGRESS` lineage entry, the documented recovery is: ``` revertReplaceSegments(tableNameWithType, lineageEntryId, forceRevert=true, ...) ``` `forceRevert=true` bypasses the `segmentsFrom` existence check (already supported); after the revert, `segmentsTo` are cleaned up via the bypass overload and the entry transitions to `REVERTED`. `endReplaceSegments`'s new error message names the missing `segmentsFrom` and points operators at this path. ## Test plan - [ ] `mvn test -pl pinot-controller -Dtest=LineageDeleteExclusionTest` (functional coverage for all blocked / allowed states, batch rejection, bypass overload, ZK fast-path, and the `endReplaceSegments` recovery flow) - [ ] `mvn test -pl pinot-common -Dtest=SegmentLineageUtilsTest` (unit coverage for the new helper) - [ ] `mvn test -pl pinot-controller -Dtest=SegmentLineageCleanupTest,PinotHelixResourceManagerStatelessTest,RetentionManagerTest` (regression coverage on existing lineage / retention paths) - [ ] Manual: with `controller.lineage.exclusive.delete.enabled=false`, confirm legacy behavior; re-enable, confirm new 409 path - [ ] Manual: trigger an `endReplaceSegments` after a bypass-deleted `segmentsFrom` and confirm the error message names the missing segments and points at `forceRevert=true` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
