Re: [D] What large-scale data or ML challenges are you facing? (datasketches-rust)

via GitHub Thu, 15 Jan 2026 09:44:10 -0800


GitHub user PavelVesely added a comment to the discussion: What large-scale 
data or ML challenges are you facing?


> A common requirement in cache replacement algorithms is tracking element 
> frequency and recency. Frequency can be directly computed using countmin, 
> which has been proven effective in LFU-based algorithms, including variants 
> with windows and decay. However, I'm uncertain about how to properly design 
> and utilize a sketch to track recency. Previously, I calculated it by 
> combining a countmin sketch with the length of a virtual queue.

Tracking recency for cache replacement policies sounds like an interesting 
problem but I'm not quite sure about the setting or which properties are 
desirable. If a cache just stores all the items, then it can also store a 
timestamp (to get recency for LRU) or frequency (for LFU) at the cost of using 
additional memory proportional to the cache size. 
- Is it the goal to use additional memory *sublinear* in the cache size?
- What kind of error is tolerable for recency? For instance, is it fine to 
return an item accessed 1 s ago for the last, while there is an item without an 
access for 1.5 seconds?

GitHub link: 
https://github.com/apache/datasketches-rust/discussions/64#discussioncomment-15509769

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] What large-scale data or ML challenges are you facing? (datasketches-rust)

Reply via email to