Re: [I] Optimize the implementation of command RANDOMKEY [kvrocks]

via GitHub Mon, 19 May 2025 08:11:49 -0700


DCjanus commented on issue #1658:
URL: https://github.com/apache/kvrocks/issues/1658#issuecomment-2891363429


   
   After reading the previous replies, I had an idea that I’d like to share for 
further brainstorming. What if we introduce a new column family (CF) that 
stores a mapping from `hash(key) -> key`, where the key is the one stored in 
the metadata CF? 
   
   Assuming the chosen hash function distributes values evenly in most user 
scenarios, we could implement random sampling by simply generating a random 
hash value and seeking the nearest key.
   
   Of course, this approach comes with some trade-offs:
   - It would require additional storage space.
   - We’d need a simple and efficient way to keep the metadata CF and this new 
CF relatively consistent (though strict consistency may not be necessary).
   
   Additionally, this mechanism could help us implement a more Redis-compatible 
SCAN command: we could return the scanned hash value, and the user could use it 
as the starting point for the next scan.
   
   Naturally, this would increase storage overhead. For users with many 
`hash`/`set` containers, the growth should be relatively small, but for 
workloads dominated by simple string keys (e.g., vector storage in ML 
scenarios), the cost might be more noticeable.
   
   This is just a rough idea, but I think it could be interesting to explore. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Optimize the implementation of command RANDOMKEY [kvrocks]

Reply via email to