DCjanus commented on issue #1658:
URL: https://github.com/apache/kvrocks/issues/1658#issuecomment-2952519342
Hi all,
I've been looking into this issue and would like to propose a detailed
technical design to optimize the `RANDOMKEY` command for better performance and
randomness.
The core idea is to implement a two-stage sampling approach that avoids
scanning the entire keyspace.
I'd also like to give a shout-out to Gemini 2.5 Pro. Its assistance was
invaluable in refining some of the details of this proposal.
Here is the proposed design. I'm looking forward to hearing your thoughts
and feedback.
---
### A High-Performance `RANDOMKEY` Implementation
This proposal aims to achieve a high-performance and highly random
`RANDOMKEY` command through pre-sampling and a two-level randomization strategy.
#### A. Core Data Structure
We will introduce a new RocksDB Column Family (CF) named `sampling_keys`.
* **Purpose**: To store a sparse, uniform sample of keys from the
`metadata` CF. This CF serves as a "fast index" to locate a random position
within the entire keyspace in O(logN) time.
* **Key Format**: `[8-byte Big-Endian hash] + [user_key]`
* `hash`: A 64-bit unsigned integer calculated from the `user_key`
using a fast, non-cryptographic hash function (e.g., MurmurHash3). It's stored
in Big-Endian format to ensure that the byte order corresponds to the numerical
order, which is crucial for `Seek` operations.
* `user_key`: The original user key. This composite key design
leverages RocksDB's sorted nature to enable efficient random positioning via
the hash prefix, while the appended `user_key` inherently resolves any hash
collisions.
* **Value Format**: Empty. All necessary information is encoded in the key.
#### B. Background Sampling Task
This will be a periodic background task with the following logic:
1. **Trigger**: The task monitors the list of active (live) SST files in
the `metadata` CF. It keeps track of the SST files it has already processed to
avoid redundant work.
2. **Sampling**: When new SST files are detected, the task starts an
iterator to scan all their key-value pairs. For each `user_key`, its 64-bit
hash is computed.
3. **Selection**: A key is selected as a sample if `hash(user_key) % 128 ==
0`.
4. **Insertion**: The generated composite key `[hash] + [user_key]` is
written to the `sampling_keys` CF.
#### C. Background Maintenance via Compaction Filter
To prevent the `sampling_keys` CF from growing indefinitely with stale data,
we'll use a `CompactionFilter`.
* **Purpose**: To keep the `sampling_keys` CF clean by removing samples of
keys that no longer exist.
* **Mechanism**: A custom `CompactionFilter` is registered for the
`sampling_keys` CF.
* **Filter Logic**: During a RocksDB compaction, for each entry in
`sampling_keys`, the filter extracts the `user_key` part and checks for its
existence in the `metadata` CF. If the `user_key` is not found, the filter
returns `true`, and the entry is discarded.
* *Note*: This is a best-effort, background cleanup process. The
correctness of `RANDOMKEY` does not depend on the real-time execution of this
cleanup.
#### D. `RANDOMKEY` Command Execution Flow
The execution of the `RANDOMKEY` command will follow these steps:
1. **Generate a Random Seek Point**: Generate a cryptographically secure
64-bit random integer `h` within the range `[0, UINT64_MAX]`. This is converted
to an 8-byte Big-Endian format to be used as the `seek_key`.
2. **Locate an Anchor in `sampling_keys`**:
* Create an iterator `iter` for the `sampling_keys` CF.
* Execute `iter->Seek(seek_key)`. This positions the iterator at the
first key that is `>= seek_key`.
* If `iter` becomes invalid (i.e., `seek_key` is larger than all
existing keys), execute `iter->SeekToFirst()` to wrap around, ensuring an
anchor key is always found.
3. **Handle Hash Collisions and Select a Starting Key**:
* From the key at the iterator's current position, extract the 8-byte
hash prefix, let's call it `found_hash`.
* Collect all subsequent keys with the same `found_hash` into a
temporary `std::vector<std::string> candidates`.
* Randomly select a key from the `candidates` list and extract its
`user_key` part. This `user_key` will be our **starting anchor, `start_key`**,
for the scan in the `metadata` CF.
4. **Scan `metadata` CF and Select the Final Key**:
* Create an iterator `meta_iter` for the `metadata` CF.
* Execute `meta_iter->Seek(start_key)`. This will position `meta_iter`
at the first valid key in the `metadata` CF that is `>= start_key`.
* Create a temporary `std::vector<std::string> results`.
* Starting from `meta_iter`'s current position, call
`meta_iter->Next()` 128 times, adding each encountered key to the `results`
list.
* If `meta_iter` becomes invalid before collecting 128 keys, execute
`meta_iter->SeekToFirst()` and continue collecting from the beginning of the
`metadata` CF until the `results` list contains 128 keys.
* Randomly select a key from the `results` list and return it as the
result of the `RANDOMKEY` command.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]