noob-se7en opened a new pull request, #18559:
URL: https://github.com/apache/pinot/pull/18559

   ## Summary
   
   Fixes a race condition in `IdealStateGroupCommit` where a leader's own 
queued entry could be silently applied by a subsequent batch after its caller 
had already concluded the commit failed and run cleanup. In the pauseless 
segment commit path, this produces "in IdealState, no ZK metadata" orphans that 
block `RealtimeSegmentValidationManager` from making any further progress.
   
   ## Problem
   
   When a leader thread's batch fails (typically because a co-batched entry 
throws `PermanentUpdaterException` after exceeding the max segment completion 
time deadline), the leader's `commit()` exits with the exception. But if the 
leader's own entry was never iterated — because iteration stopped at the 
throwing entry earlier in FIFO order — the entry remains in `_pending`. A 
subsequent leader picks it up and applies its updater.
   
   Meanwhile, the original leader's caller (e.g., 
`PinotLLCRealtimeSegmentManager.commitSegmentMetadataInternal`) catches the 
thrown exception and runs cleanup — calling `removeSegmentZKMetadataBestEffort` 
on the newly-created segment from Step 2. By the time the cleanup completes, 
the subsequent leader has applied the queued update and written the segment 
into IdealState. End state: segment is in IdealState as CONSUMING, but its ZK 
metadata is gone.
   
   ## Fix
   
   Add a `volatile boolean _cancelled` flag to `Entry`. In `commit()`'s catch 
path, set the flag on the leader's own entry. The next leader's iteration check 
skips and removes any cancelled entry.
   
   ```java
   // In the iteration loop:
   if (ent._cancelled) {
     it.remove();
     continue;
   }
   
   // In the catch block:
   entry._cancelled = true;
   processed.add(entry);
   ```
   
   Why a flag instead of `ConcurrentLinkedQueue.remove(entry)`: the flag is 
O(1); `remove(Object)` is O(n) because `ConcurrentLinkedQueue` is a linked list.
   
   The fix preserves the pre-existing all-or-nothing batch semantics (any 
thrown exception aborts the CAS write, all processed entries get the exception, 
the caller still throws). The only additional behavior: the leader's own entry 
can no longer outlive a failed batch and be silently applied later.
   
   ## Test plan
   
   - [x] `IdealStateGroupCommitTest` — 8/8 pass
     - `testFreshUpdaterAppliedAfterCallerThrows` (regression): caller throws, 
drainer skips the cancelled entry, no orphan
     - `testMultipleStuckAndFreshConsistency`: every owner's observed outcome 
matches IdealState ground truth
     - `testNoOrphanWhenCoBatchedEntryThrowsDuringStepThree`: integration test 
walking Step 2 + Step 3 + cleanup chain; asserts no orphan can form
     - `testGroupCommit` (5× invocation): unchanged, all pass
   - [x] `HelixHelperTest` — 2/2 pass (incl. single-caller 
`PermanentUpdaterException` contract test)
   - [x] `PinotLLCRealtimeSegmentManagerTest*` — 39/39 pass
   - [x] `spotless:apply`, `checkstyle:check`, `license:check` on 
`pinot-common` and `pinot-controller`
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to