DCjanus commented on issue #1658: URL: https://github.com/apache/kvrocks/issues/1658#issuecomment-2891363429
After reading the previous replies, I had an idea that I’d like to share for further brainstorming. What if we introduce a new column family (CF) that stores a mapping from `hash(key) -> key`, where the key is the one stored in the metadata CF? Assuming the chosen hash function distributes values evenly in most user scenarios, we could implement random sampling by simply generating a random hash value and seeking the nearest key. Of course, this approach comes with some trade-offs: - It would require additional storage space. - We’d need a simple and efficient way to keep the metadata CF and this new CF relatively consistent (though strict consistency may not be necessary). Additionally, this mechanism could help us implement a more Redis-compatible SCAN command: we could return the scanned hash value, and the user could use it as the starting point for the next scan. Naturally, this would increase storage overhead. For users with many `hash`/`set` containers, the growth should be relatively small, but for workloads dominated by simple string keys (e.g., vector storage in ML scenarios), the cost might be more noticeable. This is just a rough idea, but I think it could be interesting to explore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
