krishan1390 opened a new pull request, #18518:
URL: https://github.com/apache/pinot/pull/18518

   ## Why this change is needed
   
   `PinotHelixResourceManager.deleteSegments` historically had **zero lineage 
awareness**. The per-table `SegmentLineage` znode tracks segment replacements 
(`segmentsFrom -> segmentsTo`) across three states (`IN_PROGRESS`, `COMPLETED`, 
`REVERTED`), but every external delete path — REST DELETEs, `RetentionManager`, 
minion task generators, push-failure cleanup — could remove a segment 
regardless of whether it participated in a live lineage entry. Symmetrically, 
`endReplaceSegments` only checked that `segmentsTo` are ONLINE; it never 
re-validated that `segmentsFrom` still existed.
   
   This produced four concrete failure modes:
   
   1. `startReplaceSegments(A1 -> B)` races with a concurrent 
`deleteSegment(A1)` and commits an `IN_PROGRESS` entry referencing a 
now-missing `A1`. `revertReplaceSegments` then fails its own existence check 
without `forceRevert=true`.
   2. Lineage `IN_PROGRESS` with `A1` in `segmentsFrom` — `deleteSegment(A1)` 
silently breaks the revert path.
   3. Lineage `COMPLETED` with `A1` in `segmentsFrom` — `A1` should only be 
deleted by the lineage cleanup path 
(`DefaultLineageManager.updateLineageForRetention`). An external delete 
bypasses retention/revert semantics.
   4. Lineage `IN_PROGRESS` with `B` in `segmentsTo` — `deleteSegment(B)` makes 
`endReplaceSegments` hang and then fail on the 10-minute ONLINE poll.
   
   ## The blocked-set rule
   
   A segment `s` is "lineage-locked" iff some lineage entry has:
   
   - state `IN_PROGRESS` and `s ∈ (segmentsFrom ∪ segmentsTo)`, OR
   - state `COMPLETED` and `s ∈ segmentsFrom`.
   
   `REVERTED` entries lock nothing. This is the same set that 
`SegmentLineageUtils.filterSegmentsBasedOnLineageInPlace` already removes from 
query routing — i.e. anything currently hidden from queries because of lineage 
is also blocked from external deletion. The new 
`SegmentLineageUtils.getDeleteBlockedSegments` helper lives next to that filter 
so the symmetry is obvious.
   
   ## The correctness mechanism
   
   The change is intentionally asymmetric and avoids any new ZK lock or CAS on 
the delete hot path. The lineage znode is the globally-visible coordination 
point, and correctness is enforced by checks at **both** ends of the lifecycle:
   
   | Interleaving | Outcome |
   |---|---|
   | Lineage write commits first; delete reads lineage after | `deleteSegments` 
sees the entry, fails with `SegmentsInLineageException` (REST 409). IdealState 
is left untouched. |
   | Delete commits IdealState removal first; lineage write commits after | 
Both commit. `endReplaceSegments`'s new re-validation of `segmentsFrom` against 
IdealState fails with an actionable error. Stranded `IN_PROGRESS` entry; 
recovered via `revertReplaceSegments(forceRevert=true)`. |
   | Truly concurrent reads | Same as above — `endReplaceSegments` catches it. |
   
   The remaining race window is small **and** recoverable: it produces a 
stranded `IN_PROGRESS` entry, never silent corruption. This applies uniformly 
to offline and realtime tables because both flow through the same 
`PinotHelixResourceManager.deleteSegments` primitive.
   
   ## Changes
   
   ### `PinotHelixResourceManager`
   - New private `deleteSegmentsInternal(table, segments, retentionPeriod, 
bypassLineageCheck)`; public `deleteSegments(...)` overloads delegate with 
`bypassLineageCheck=false`.
   - New public `deleteSegmentsForLineageCleanup(...)` overloads delegate with 
`bypassLineageCheck=true` — reserved for internal callers that have already 
coordinated with the lineage lifecycle (`startReplaceSegments` proactive 
cleanup, `revertReplaceSegments` post-revert cleanup, `RetentionManager`'s 
lineage retention pass).
   - Lineage pre-check runs **before** 
`HelixHelper.removeSegmentsFromIdealState`. A single ZK read; fast-path on 
missing znode; fail-closed on ZK error.
   - The whole batch is rejected on any conflict — no partial commit semantics 
on IdealState.
   - `endReplaceSegments` now re-validates that every segment in 
`lineageEntry.getSegmentsFrom()` is still present in IdealState before flipping 
the entry to `COMPLETED`. The authoritative source is IdealState (not segment 
ZK metadata, which lingers in `SegmentDeletionManager`'s retention window after 
a delete).
   
   ### `SegmentsInLineageException` (new, package-local typed exception)
   Carries `tableNameWithType` and the set of `blockingSegments`. Mapped by 
`PinotSegmentRestletResource` to HTTP **409 Conflict**.
   
   ### `RetentionManager`
   - Routes the legitimate lineage cleanup call through the bypass overload.
   - New `removeLineageLockedSegments` helper filters lineage-locked segments 
out of the offline / realtime / hybrid retention batches before calling 
`deleteSegments`, so the public check never has to reject a retention batch. 
Lineage-locked segments are left in place until the lineage retention pass 
decides to remove them.
   
   ### `ControllerConf`
   - New `controller.lineage.exclusive.delete.enabled` kill switch (default 
`true`) to fall back to legacy behavior during a rolling upgrade if needed.
   
   ### `ControllerMeter`
   - New `LINEAGE_BLOCKED_DELETE_COUNT` table-scoped meter, incremented when a 
delete request is rejected.
   
   ## Recovery contract
   
   For the narrow race that produces a stranded `IN_PROGRESS` lineage entry, 
the documented recovery is:
   
   ```
   revertReplaceSegments(tableNameWithType, lineageEntryId, forceRevert=true, 
...)
   ```
   
   `forceRevert=true` bypasses the `segmentsFrom` existence check (already 
supported); after the revert, `segmentsTo` are cleaned up via the bypass 
overload and the entry transitions to `REVERTED`. `endReplaceSegments`'s new 
error message names the missing `segmentsFrom` and points operators at this 
path.
   
   ## Test plan
   - [ ] `mvn test -pl pinot-controller -Dtest=LineageDeleteExclusionTest` 
(functional coverage for all blocked / allowed states, batch rejection, bypass 
overload, ZK fast-path, and the `endReplaceSegments` recovery flow)
   - [ ] `mvn test -pl pinot-common -Dtest=SegmentLineageUtilsTest` (unit 
coverage for the new helper)
   - [ ] `mvn test -pl pinot-controller 
-Dtest=SegmentLineageCleanupTest,PinotHelixResourceManagerStatelessTest,RetentionManagerTest`
 (regression coverage on existing lineage / retention paths)
   - [ ] Manual: with `controller.lineage.exclusive.delete.enabled=false`, 
confirm legacy behavior; re-enable, confirm new 409 path
   - [ ] Manual: trigger an `endReplaceSegments` after a bypass-deleted 
`segmentsFrom` and confirm the error message names the missing segments and 
points at `forceRevert=true`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to