sollhui opened a new pull request, #61273:
URL: https://github.com/apache/doris/pull/61273

   ### Problem
   In the LRU-K implementation, when LRUCache::erase() is called, it removes 
the key from the main cache hash table but does not remove it from the visits 
list (_visits_lru_cache_map / _visits_lru_cache_list):
   
   
   void LRUCache::erase(const CacheKey& key, uint32_t hash) {
       ...
       e = _table.remove(key, hash);   // removed from main cache
       ...
       // visits list is NOT cleaned up ← missing
   }
   This matters when a segment is accessed exactly once (enters the visits 
list, not yet promoted to main cache) and then gets erased before its second 
access — the typical scenario being compaction: when old rowsets are merged, 
SegmentLoader::erase_segments() is called for all segments of the old rowset.
   
   
   Timeline:
     1. Segment S accessed once → enters visits list, not in main cache
     2. Compaction merges the rowset containing S
     3. erase(S) called → S removed from main cache (no-op, wasn't there)
                        → S's visits list entry remains ← stale
     4. visits list entry for S occupies _visits_lru_cache_usage indefinitely
        until it's evicted by LRU pressure from newer entries
   The visits list capacity is bounded by _capacity (same as main cache, ~1.47 
GB for SegmentCache). Stale entries accumulate and reduce the effective 
tracking window for legitimate segments waiting to be promoted, slightly 
increasing miss rate under compaction-heavy workloads.
   
   ### Fix
   In LRUCache::erase(), after removing the entry from the main cache, also 
check the visits list and remove the entry if present:
   
   if (_is_lru_k) {
       auto it = _visits_lru_cache_map.find(key.to_string());
       if (it != _visits_lru_cache_map.end()) {
           _visits_lru_cache_usage -= it->second->second;
           _visits_lru_cache_list.erase(it->second);
           _visits_lru_cache_map.erase(it);
       }
   }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to