zhannngchen opened a new pull request, #17011: URL: https://github.com/apache/doris/pull/17011
# Proposed changes Issue Number: close #xxx ## Problem summary For Unqiue Key MoW table, Segment object holds a PrimaryKeyIndexReader, which will load a primary key bloom filter to it's member `_bf`, the memory will be released until the segment object is cleared from SegmentCache. Primary key boom filter can consume quite large memory, we've seen a user with 40000 handles (which is not a quite large number), their SegmentCache consumes 20GB memory. The `_bf` variable is designed to be loaded on demand (currently it's only load when we need to lookup row key), but for MoW table, almost all segment need to load it(all segments need to be check if they have keys that should be marked as deleted). In most MoW scenario, user need realtime load, the segments will be retired very quickly (due to frequently compaction), So we need to release the segment object timely. In this PR, I propose to prune stale segments every 1 minute (now it's 30min). The prune operation is quite costive, since it needs to hold the write lock to traverse all items in cache and check if they need to be pruned. The traverse is start from the oldest item to newest ones, If we try to prune stale segments only, we don't need to traverse all items every time, we can stop once we found an item is not stale. In this way, every item only need to be checked once, even we try to prune the cache frequently, it's also very effective. ## Checklist(Required) * [ ] Does it affect the original behavior * [ ] Has unit tests been added * [ ] Has document been added or modified * [ ] Does it need to update dependencies * [ ] Is this PR support rollback (If NO, please explain WHY) ## Further comments If this is a relatively large or complex change, kick off the discussion at [[email protected]](mailto:[email protected]) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
