DCjanus commented on issue #1658:
URL: https://github.com/apache/kvrocks/issues/1658#issuecomment-2952519342

   Hi all,
   
   I've been looking into this issue and would like to propose a detailed 
technical design to optimize the `RANDOMKEY` command for better performance and 
randomness.
   
   The core idea is to implement a two-stage sampling approach that avoids 
scanning the entire keyspace.
   
   I'd also like to give a shout-out to Gemini 2.5 Pro. Its assistance was 
invaluable in refining some of the details of this proposal.
   
   Here is the proposed design. I'm looking forward to hearing your thoughts 
and feedback.
   
   ---
   
   ### A High-Performance `RANDOMKEY` Implementation
   
   This proposal aims to achieve a high-performance and highly random 
`RANDOMKEY` command through pre-sampling and a two-level randomization strategy.
   
   #### A. Core Data Structure
   
   We will introduce a new RocksDB Column Family (CF) named `sampling_keys`.
   
   *   **Purpose**: To store a sparse, uniform sample of keys from the 
`metadata` CF. This CF serves as a "fast index" to locate a random position 
within the entire keyspace in O(logN) time.
   *   **Key Format**: `[8-byte Big-Endian hash] + [user_key]`
       *   `hash`: A 64-bit unsigned integer calculated from the `user_key` 
using a fast, non-cryptographic hash function (e.g., MurmurHash3). It's stored 
in Big-Endian format to ensure that the byte order corresponds to the numerical 
order, which is crucial for `Seek` operations.
       *   `user_key`: The original user key. This composite key design 
leverages RocksDB's sorted nature to enable efficient random positioning via 
the hash prefix, while the appended `user_key` inherently resolves any hash 
collisions.
   *   **Value Format**: Empty. All necessary information is encoded in the key.
   
   #### B. Background Sampling Task
   
   This will be a periodic background task with the following logic:
   
   1.  **Trigger**: The task monitors the list of active (live) SST files in 
the `metadata` CF. It keeps track of the SST files it has already processed to 
avoid redundant work.
   2.  **Sampling**: When new SST files are detected, the task starts an 
iterator to scan all their key-value pairs. For each `user_key`, its 64-bit 
hash is computed.
   3.  **Selection**: A key is selected as a sample if `hash(user_key) % 128 == 
0`.
   4.  **Insertion**: The generated composite key `[hash] + [user_key]` is 
written to the `sampling_keys` CF.
   
   #### C. Background Maintenance via Compaction Filter
   
   To prevent the `sampling_keys` CF from growing indefinitely with stale data, 
we'll use a `CompactionFilter`.
   
   *   **Purpose**: To keep the `sampling_keys` CF clean by removing samples of 
keys that no longer exist.
   *   **Mechanism**: A custom `CompactionFilter` is registered for the 
`sampling_keys` CF.
   *   **Filter Logic**: During a RocksDB compaction, for each entry in 
`sampling_keys`, the filter extracts the `user_key` part and checks for its 
existence in the `metadata` CF. If the `user_key` is not found, the filter 
returns `true`, and the entry is discarded.
       *   *Note*: This is a best-effort, background cleanup process. The 
correctness of `RANDOMKEY` does not depend on the real-time execution of this 
cleanup.
   
   #### D. `RANDOMKEY` Command Execution Flow
   
   The execution of the `RANDOMKEY` command will follow these steps:
   
   1.  **Generate a Random Seek Point**: Generate a cryptographically secure 
64-bit random integer `h` within the range `[0, UINT64_MAX]`. This is converted 
to an 8-byte Big-Endian format to be used as the `seek_key`.
   
   2.  **Locate an Anchor in `sampling_keys`**:
       *   Create an iterator `iter` for the `sampling_keys` CF.
       *   Execute `iter->Seek(seek_key)`. This positions the iterator at the 
first key that is `>= seek_key`.
       *   If `iter` becomes invalid (i.e., `seek_key` is larger than all 
existing keys), execute `iter->SeekToFirst()` to wrap around, ensuring an 
anchor key is always found.
   
   3.  **Handle Hash Collisions and Select a Starting Key**:
       *   From the key at the iterator's current position, extract the 8-byte 
hash prefix, let's call it `found_hash`.
       *   Collect all subsequent keys with the same `found_hash` into a 
temporary `std::vector<std::string> candidates`.
       *   Randomly select a key from the `candidates` list and extract its 
`user_key` part. This `user_key` will be our **starting anchor, `start_key`**, 
for the scan in the `metadata` CF.
   
   4.  **Scan `metadata` CF and Select the Final Key**:
       *   Create an iterator `meta_iter` for the `metadata` CF.
       *   Execute `meta_iter->Seek(start_key)`. This will position `meta_iter` 
at the first valid key in the `metadata` CF that is `>= start_key`.
       *   Create a temporary `std::vector<std::string> results`.
       *   Starting from `meta_iter`'s current position, call 
`meta_iter->Next()` 128 times, adding each encountered key to the `results` 
list.
       *   If `meta_iter` becomes invalid before collecting 128 keys, execute 
`meta_iter->SeekToFirst()` and continue collecting from the beginning of the 
`metadata` CF until the `results` list contains 128 keys.
       *   Randomly select a key from the `results` list and return it as the 
result of the `RANDOMKEY` command.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to