GitHub user LiuQhahah edited a discussion: Cuckoo Filter Design Proposal
# 1. Introduction In https://github.com/apache/kvrocks/issues/2534, we plan to support the cuckoo filter. This document outlines the design for implementing a Cuckoo Filter in Kvrocks. The Cuckoo Filter is a high-performance, probabilistic data structure used for approximate set membership testing. It allows for fast checks to see if an item is in a set and, unlike Bloom Filters, supports item deletion. # 2. Core Concepts * Fingerprint: Instead of storing the items themselves, the Cuckoo Filter stores a "fingerprint" of the item, which is a small hash derived from the original item. * Two Candidate Buckets: For any given item, it can be stored in one of two possible buckets. The locations of these buckets are determined by two hash functions. * Cuckoo Hashing / Kicking: When a new item needs to be inserted and both of its candidate buckets are full, the filter will evict an existing item from one of the buckets to make space. This evicted item is then re-inserted into its alternate location, which may cause a cascade of evictions. This process continues until all items find a home or a maximum number of evictions is reached. * Rehashing: If the kicking process fails to find a home for an item after a set number of attempts, the filter is considered full. An rehashing is then triggered, increasing the total number of buckets and rehashing existing items to accommodate more data. # 3. Data Structure Design To efficiently implement the Cuckoo Filter in Kvrocks, we will adopt a sharded (or chained) storage model. This model breaks down a large, logical Cuckoo Filter into smaller, manageable chunks, each stored as a separate key-value pair in RocksDB. ## 3.1. Metadata The metadata stores the overall information and configuration for the Cuckoo Filter. Each Cuckoo Filter instance corresponds to a single metadata key-value pair in RocksDB. * Key Format: __namespace__:cuckoo_filter:{user_key} * Value Format: A serialized CuckooChainMetadata object. The structure of this object is detailed below. ### CuckooChainMetadata Structure The metadata value is a tightly packed binary structure. The total size is typically 26 bytes, though this may vary slightly with memory alignment. | Field Name | Data Type | Size (Bytes) | Description | |---|---|---|---| | n_filters | uint32_t | 4 | The number of filter shards in the chain. | | capacity | uint64_t | 8 | The number of buckets stored in each individual filter shard. | | size | uint64_t | 8 | The current number of items stored in the filter chain. | | max_iterations | uint32_t | 4 | The maximum number of kicks allowed during an insertion before rehashing. | | bucket_size | uint8_t | 1 | The number of fingerprints (slots) each bucket can hold (e.g., 2, 4). | | expansion | uint8_t | 1 | The factor by which the filter's capacity grows during expansion (e.g., 2 for doubling). | ## 3.2. Filter Data To ensure high performance and scalability, a single logical sub-filter can be partitioned into multiple smaller chunks. This strategy avoids the performance pitfalls of storing values in RocksDB. Each chunk is stored as a separate key-value pair. The binary data of the filter is stored in one or more data chunks. The key format is designed to uniquely identify each chunk within its logical sub-filter. * Key Format: cf:{namespace_key}:{filter_index}:{chunk_index} * cf: A fixed prefix to identify Cuckoo Filter data. * {namespace_key}: The same key used for the metadata. * {filter_index}: The 0-based index of the logical sub-filter in the chain. * {chunk_index}: The 0-based index of the data chunk within that logical sub-filter. * Value Format: The binary data of that specific chunk, containing the actual fingerprints. This is treated as an opaque blob by RocksDB. Example ~~For a my_filter that has been expanded once, the storage would look like this:~~ 1. ~~Metadata Key:~~ * ~~Key: __namespace__:cuckoo_filter:my_filter~~ * ~~Value: (Serialized metadata, e.g., {num_filters: 2, capacity: 3000, size: 1500, ...})~~ 2. ~~Data Key 0:~~ * ~~Key: __cf__:__namespace__:cuckoo_filter:my_filter:0~~ * ~~Value: (Binary data for the first shard, capacity 1000)~~ 3. ~~Data Key 1:~~ * ~~Key: __cf__:__namespace__:cuckoo_filter:my_filter:1~~ * ~~Value: (Binary data for the second shard, capacity 2000)~~ Consider a filter my_filter that has been expanded once. The first sub-filter is small and fits in one chunk, while the second, larger sub-filter is split into two chunks. Metadata Key: * Key: namespace:cuckoo_filter:my_filter * Value: (Serialized metadata, e.g., {num_filters: 2, filter_chunks: [1, 2], max_chunk_size: 1048576, ...}) (Note: `filter_chunks: [1, 2]` indicates filter 0 has 1 chunk, and filter 1 has 2 chunks.) Data Chunks for Filter 0: * Key: cf:namespace:cuckoo_filter:my_filter:0:0 * Value: (Binary data for the first sub-filter) Data Chunks for Filter 1: * Key 1: cf:namespace:cuckoo_filter:my_filter:1:0 * Value: (Binary data for the first half of the second sub-filter) * Key 2: cf:namespace:cuckoo_filter:my_filter:1:1 * Value: (Binary data for the second half of the second sub-filter) ## 3.3. Storage Layout in RocksDB The following diagram illustrates how a Cuckoo Filter with two shards is stored in RocksDB: ``` +-------------------------------------------------------------------------------------------------+ | RocksDB Storage (Optimized Schema) | +-------------------------------------------------------------------------------------------------+ | | | +-----------------------------------------------------------------------------------------+ | | | Metadata Entry | | | | | | | | Key: "__namespace__:cuckoo_filter:my_filter" | | | | | | | | Value: (Serialized JSON) | | | | { | | | | "n_filters": 2, | | | | "filter_chunks": [1, 2], // -> Filter 0 has 1 chunk, Filter 1 has 2 chunks. | | | | "max_chunk_size": 1048576, // -> Max size per chunk is 1MB. | | | | "bucket_size": 2, | | | | "expansion": 2, | | | | ... | | | | } | | | +-----------------------------------------------------------------------------------------+ | | | | +-----------------------------------------------------------------------------------------+ | | | Data Chunk for Filter #0 | | | | | | | | Key: "__cf__:__namespace__:cuckoo_filter:my_filter:0:0" | | | | // ^ ^ | | | | // Filter Index | Chunk Index | | | | | | | | Value: (Binary data for the first filter, size <= 1MB) | | | +-----------------------------------------------------------------------------------------+ | | | | +-----------------------------------------------------------------------------------------+ | | | Data Chunk #0 for Filter #1 | | | | | | | | Key: "__cf__:__namespace__:cuckoo_filter:my_filter:1:0" | | | | | | | | Value: (Binary data for the first half of the second filter, size <= 1MB) | | | +-----------------------------------------------------------------------------------------+ | | | | +-----------------------------------------------------------------------------------------+ | | | Data Chunk #1 for Filter #1 | | | | | | | | Key: "__cf__:__namespace__:cuckoo_filter:my_filter:1:1" | | | | | | | | Value: (Binary data for the second half of the second filter, size <= 1MB) | | | +-----------------------------------------------------------------------------------------+ | | | +-------------------------------------------------------------------------------------------------+ ``` # 4. Operational Flow ## 4.1. Add Operation (CF.ADD) 1. Read Metadata: Read the metadata for the given user_key. 2. Calculate the two candidate bucket locations ((filter_idx1, bucket_in_filter1) and (filter_idx2, bucket_in_filter2)). 3. Read the data for the corresponding 1 or 2 shards from RocksDB 4. Attempt to insert the fingerprint into an empty slot in either bucket. 5. If both are full, begin the cuckoo kicking process, which may involve reading/writing to different shards as items are moved. 6. If insertion succeeds, increment size in metadata and atomically write the updated metadata and modified shard(s) back to RocksDB. 7. If kicking exceeds max_iterations, the insertion fails. Trigger the Reshading process. ## 4.2. Expansion 1. Update Metadata: * Increment num_filters. * Calculate the new total capacity based on the expansion_factor. 2. Create New Shard: * Generate a new, empty filter data block. * Atomically write the new metadata and the new filter shard to RocksDB. 3. Retry Insert: After resharding, re-attempt the insert operation, which will now target the newly created empty shard. ## 4.3. Exists Operation (CF.EXISTS) 1. Read Metadata: Read the metadata for the user_key. 2. Iterate Through Shards: * Loop from shard_index = 0 to num_filters - 1. * For each shard, read its data from RocksDB. * Check if the item exists in the current shard. * If the item is found in any shard, return True immediately. 3. Return Result: If the loop completes without finding the item, return False. ## 4.4. Delete Operation (CF.DELETE) The delete operation is similar to the exists operation, requiring a search through the shards. 1. Read Metadata: Read the metadata for the user_key. 2. Iterate Through Shards: * Loop from shard_index = 0 to num_filters - 1. * For each shard, read its data and attempt to delete the item. * If the item is successfully deleted, update the size in the metadata, write the updated metadata and shard data back to RocksDB, and return True. 3. Return Result: If the loop completes without deleting the item, return False. # 5. Concurrency Safety * Atomic Writes: All write operations (ADD, DELETE, EXPAND) must be atomic to ensure data consistency. This can be achieved using RocksDB's WriteBatch. * Locking: To prevent race conditions, a lock should be acquired for the duration of the read-modify-write cycle, especially when updating metadata. This ensures that concurrent operations do not interfere with each other. # Reference 1. https://redis.io/docs/latest/develop/data-types/probabilistic/cuckoo-filter/ 2. https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf GitHub link: https://github.com/apache/kvrocks/discussions/3079 ---- This is an automatically sent email for issues@kvrocks.apache.org. To unsubscribe, please send an email to: issues-unsubscr...@kvrocks.apache.org