yashmayya opened a new pull request, #18113: URL: https://github.com/apache/pinot/pull/18113
## Summary - **Leader coordination**: Only the lead controller runs `ResponseStoreCleaner` by gating `processTables()` on `isLeaderForTable(TASK_NAME)`, preventing all controllers from racing to delete the same expired responses on each broker. - **Graceful concurrent deletion on broker**: `AbstractResponseStore.deleteResponse()` now catches exceptions from `readResponse()` when files vanish between the `exists()` check and the read (TOCTOU race). `FsResponseStore.deleteResponseImpl()` catches exceptions from `pinotFS.delete()` and treats already-gone directories as success instead of throwing. - **No batch abort on individual failures**: `ResponseStoreCleaner.deleteExpiredResponses()` logs individual DELETE failures as warnings instead of throwing a `RuntimeException`, so one failed DELETE no longer aborts the entire broker's cleanup batch. ## Root cause When multiple controllers run the `ResponseStoreCleaner` concurrently (all controllers run it because `processTables()` ignores the table leadership list), they race to delete the same expired cursor responses on each broker. The broker's `deleteResponse()` has a TOCTOU race between `exists()` → `readResponse()` → `deleteResponseImpl()` — when one controller deletes a cursor's files, the others hit `FileNotFoundException` / `IOException`, and the broker returns HTTP 500 instead of 404. The controller's `deleteExpiredResponses()` then throws on any single 500, aborting the remaining successful deletes' logging for that broker. ## Test plan - [x] Existing `ResponseStoreCleanerTest` tests pass (including `testPartialBrokerFailureDoesNotBlockOthers` and `testCleanupTreats404AsSuccess`) - [ ] Verify in a multi-controller environment that only the lead controller runs the cleaner - [ ] Verify that concurrent DELETE requests to the broker no longer cause HTTP 500s 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
