9aman opened a new pull request, #16824:
URL: https://github.com/apache/pinot/pull/16824
## Problem
RetentionManager was experiencing high runtime when processing large
numbers of segments due to O(n) List.contains() operations during segment
exclusion checks.
```
03:18:05.742 INFO [RetentionManager] [pool-16-thread-12] Found: 394300
segments in deepstore for table: <tableName>. Time taken to list segments:
37021 ms
03:49:30.354 INFO [RetentionManager] [pool-16-thread-12] Took: 1921633 ms to
identify 56 segments for deletion from deep store for table: <tableName> as
they have no corresponding entry in the property store.
```
## Solution
Replaced List<String> with Set<String> for segment collections, converting
O(n) lookups to O(1) HashSet operations. This eliminates the performance
bottleneck in findUntrackedSegmentsToDeleteFromDeepstore().
## Impact
Dramatically improves performance for large-scale deployments. Added
performance test validates handling 400,000 segments within 30 seconds,
preventing future regressions.
### Sample runs using List and Set
<img width="720" height="176" alt="image"
src="https://github.com/user-attachments/assets/736766b1-f119-4145-b11f-49020b143fe6"
/>
<img width="720" height="137" alt="image"
src="https://github.com/user-attachments/assets/969cb81a-7317-4387-a847-ddf25aa46d75"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]