zhannngchen opened a new pull request, #17011:
URL: https://github.com/apache/doris/pull/17011

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   For Unqiue Key MoW table, Segment object holds a PrimaryKeyIndexReader, 
which will load a primary key bloom filter to it's member `_bf`, the memory 
will be released until the segment object is cleared from SegmentCache. Primary 
key boom filter can consume quite large memory, we've seen a user with 40000 
handles (which is not a quite large number), their SegmentCache consumes 20GB 
memory.
   
   The `_bf` variable is designed to be loaded on demand (currently it's only 
load when we need to lookup row key), but for MoW table, almost all segment 
need to load it(all segments need to be check if they have keys that should be 
marked as deleted).  In most MoW scenario, user need realtime load, the 
segments will be retired very quickly (due to frequently compaction), So we 
need to release the segment object timely.
   
   In this PR, I propose to prune stale segments every 1 minute (now it's 
30min).
   The prune operation is quite costive, since it needs to hold the write lock 
to traverse all items in cache and check if they need to be pruned. The 
traverse is start from the oldest item to newest ones, If we try to prune stale 
segments only, we don't need to traverse all items every time, we can stop once 
we found an item is not stale.
   In this way, every item only need to be checked once, even we try to prune 
the cache frequently, it's also very effective.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[[email protected]](mailto:[email protected]) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to