GitHub user LiuQhahah edited a discussion: Cuckoo Filter Design Proposal


  # 1. Introduction

In https://github.com/apache/kvrocks/issues/2534, we plan to support the cuckoo 
filter.

This document outlines the design for implementing a Cuckoo Filter in Kvrocks. 
The Cuckoo Filter is a high-performance, probabilistic data structure used for 
approximate set membership testing. It allows for fast checks to see if an item 
is in a set and, unlike Bloom Filters, supports item deletion.

  # 2. Core Concepts

   * Fingerprint: Instead of storing the items themselves, the Cuckoo Filter 
stores a "fingerprint" of the item, which is a small hash derived from the 
original item.
   * Two Candidate Buckets: For any given item, it can be stored in one of two 
possible buckets. The locations of these buckets are determined by two hash 
functions.
   * Cuckoo Hashing / Kicking: When a new item needs to be inserted and both of 
its candidate buckets are full, the filter will evict an existing item from one 
of the buckets to make space.
     This evicted item is then re-inserted into its alternate location, which 
may cause a cascade of evictions. This process continues until all items find a 
home or a maximum number of evictions is reached.

   * Rehashing: If the kicking process fails to find a home for an item after a 
set number of attempts, the filter is considered full. An rehashing is then 
triggered,
     increasing the total number of buckets and rehashing existing items to 
accommodate more data.

 # 3. Data Structure Design

  To efficiently implement the Cuckoo Filter in Kvrocks, we will adopt a 
sharded (or chained) storage model. This model breaks down a large, logical 
Cuckoo Filter into smaller, manageable chunks, each stored as a separate 
key-value pair in RocksDB.

  ## 3.1. Metadata


  The metadata stores the overall information and configuration for the Cuckoo 
Filter. Each Cuckoo Filter instance corresponds to a single metadata key-value 
pair in RocksDB.

   * Key Format: __namespace__:cuckoo_filter:{user_key}
   * Value Format: A serialized CuckooChainMetadata object. The structure of 
this object is detailed below.

  ### CuckooChainMetadata Structure

  The metadata value is a tightly packed binary structure. The total size is 
typically 26 bytes, though this may vary slightly with memory alignment.


| Field Name  | Data Type  | Size (Bytes)  |  Description  |  
|---|---|---|---|
| n_filters  | uint32_t  | 4  | The number of filter shards in the chain.  |   
|  capacity |  uint64_t | 8  |   The number of buckets stored in each 
individual filter shard.     |   
|  size |  uint64_t | 8  | The current number of items stored in the filter 
chain.   |   
|  max_iterations | uint32_t  |  4 | The maximum number of kicks allowed during 
an insertion before rehashing.    |   
|  bucket_size | uint8_t  | 1  | The number of fingerprints (slots) each bucket 
can hold (e.g., 2, 4).    |   
|  expansion |  uint8_t | 1  |  The factor by which the filter's capacity grows 
during expansion (e.g., 2 for doubling).  |   





  ## 3.2. Filter Data

 To ensure high performance and scalability, a single logical sub-filter can be 
partitioned into multiple smaller chunks. This strategy avoids the performance 
pitfalls of storing values in RocksDB. Each chunk is stored as a separate 
key-value pair.


The binary data of the filter is stored in one or more data chunks. The key 
format is designed to uniquely identify each chunk within its logical 
sub-filter.

   * Key Format: cf:{namespace_key}:{filter_index}:{chunk_index}
       * cf: A fixed prefix to identify Cuckoo Filter data.
       * {namespace_key}: The same key used for the metadata.
       * {filter_index}: The 0-based index of the logical sub-filter in the 
chain.
       * {chunk_index}: The 0-based index of the data chunk within that logical 
sub-filter.

   * Value Format: The binary data of that specific chunk, containing the 
actual fingerprints. This is treated as an opaque blob by RocksDB.

  Example


  ~~For a my_filter that has been expanded once, the storage would look like 
this:~~

   1. ~~Metadata Key:~~
       * ~~Key: __namespace__:cuckoo_filter:my_filter~~
       * ~~Value: (Serialized metadata, e.g., {num_filters: 2, capacity: 3000, 
size: 1500, ...})~~

   2. ~~Data Key 0:~~
       * ~~Key: __cf__:__namespace__:cuckoo_filter:my_filter:0~~
       * ~~Value: (Binary data for the first shard, capacity 1000)~~

   3. ~~Data Key 1:~~
       * ~~Key: __cf__:__namespace__:cuckoo_filter:my_filter:1~~
       * ~~Value: (Binary data for the second shard, capacity 2000)~~

  Consider a filter my_filter that has been expanded once. The first sub-filter 
is small and fits in one chunk, while the second, larger sub-filter is split 
into two chunks.

  Metadata Key:

   * Key: namespace:cuckoo_filter:my_filter
   * Value: (Serialized metadata, e.g., {num_filters: 2, filter_chunks: [1, 2], 
max_chunk_size: 1048576, ...})
      (Note: `filter_chunks: [1, 2]` indicates filter 0 has 1 chunk, and filter 
1 has 2 chunks.)


  Data Chunks for Filter 0:

   * Key: cf:namespace:cuckoo_filter:my_filter:0:0
   * Value: (Binary data for the first sub-filter)

  Data Chunks for Filter 1:

   * Key 1: cf:namespace:cuckoo_filter:my_filter:1:0
   * Value: (Binary data for the first half of the second sub-filter)

   * Key 2: cf:namespace:cuckoo_filter:my_filter:1:1
   * Value: (Binary data for the second half of the second sub-filter)


  ## 3.3. Storage Layout in RocksDB

  The following diagram illustrates how a Cuckoo Filter with two shards is 
stored in RocksDB:



```
+-------------------------------------------------------------------------------------------------+
|                                        RocksDB Storage (Optimized Schema)     
                    |
+-------------------------------------------------------------------------------------------------+
|                                                                               
                  |
|   
+-----------------------------------------------------------------------------------------+
   |
|   |                                    Metadata Entry                         
              |   |
|   |                                                                           
              |   |
|   |   Key: "__namespace__:cuckoo_filter:my_filter"                            
        |   |
|   |                                                                           
              |   |
|   |   Value: (Serialized JSON)                                                
              |   |
|   |     {                                                                     
              |   |
|   |       "n_filters": 2,                                                     
              |   |
|   |       "filter_chunks": [1, 2],  // -> Filter 0 has 1 chunk, Filter 1 has 
2 chunks.      |   |
|   |       "max_chunk_size": 1048576, // -> Max size per chunk is 1MB.         
              |   |
|   |       "bucket_size": 2,                                                   
              |   |
|   |       "expansion": 2,                                                     
              |   |
|   |       ...                                                                 
              |   |
|   |     }                                                                     
              |   |
|   
+-----------------------------------------------------------------------------------------+
   |
|                                                                               
                  |
|   
+-----------------------------------------------------------------------------------------+
   |
|   |                                Data Chunk for Filter #0                   
              |   |
|   |                                                                           
              |   |
|   |   Key: "__cf__:__namespace__:cuckoo_filter:my_filter:0:0"                 
        |   |
|   |   //                                                        ^ ^           
              |   |
|   |   //                                             Filter Index | Chunk 
Index             |   |
|   |                                                                           
              |   |
|   |   Value: (Binary data for the first filter, size <= 1MB)                  
              |   |
|   
+-----------------------------------------------------------------------------------------+
   |
|                                                                               
                  |
|   
+-----------------------------------------------------------------------------------------+
   |
|   |                                Data Chunk #0 for Filter #1                
              |   |
|   |                                                                           
              |   |
|   |   Key: "__cf__:__namespace__:cuckoo_filter:my_filter:1:0"                 
        |   |
|   |                                                                           
              |   |
|   |   Value: (Binary data for the first half of the second filter, size <= 
1MB)             |   |
|   
+-----------------------------------------------------------------------------------------+
   |
|                                                                               
                  |
|   
+-----------------------------------------------------------------------------------------+
   |
|   |                                Data Chunk #1 for Filter #1                
              |   |
|   |                                                                           
              |   |
|   |   Key: "__cf__:__namespace__:cuckoo_filter:my_filter:1:1"                 
        |   |
|   |                                                                           
              |   |
|   |   Value: (Binary data for the second half of the second filter, size <= 
1MB)            |   |
|   
+-----------------------------------------------------------------------------------------+
   |
|                                                                               
                  |
+-------------------------------------------------------------------------------------------------+

```

  # 4. Operational Flow

  ## 4.1. Add Operation (CF.ADD)

   1. Read Metadata: Read the metadata for the given user_key.
   2. Calculate the two candidate bucket locations ((filter_idx1, 
bucket_in_filter1) and (filter_idx2, bucket_in_filter2)).
   3. Read the data for the corresponding 1 or 2 shards from RocksDB
   4. Attempt to insert the fingerprint into an empty slot in either bucket.
   5. If both are full, begin the cuckoo kicking process, which may involve 
reading/writing to different shards as items are moved.
   6. If insertion succeeds, increment size in metadata and atomically write 
the updated metadata and modified shard(s) back to RocksDB.
   7. If kicking exceeds max_iterations, the insertion fails. Trigger the 
Reshading process.

  ## 4.2. Expansion

   1. Update Metadata:
       * Increment num_filters.
       * Calculate the new total capacity based on the expansion_factor.
   2. Create New Shard:
       * Generate a new, empty filter data block.
       * Atomically write the new metadata and the new filter shard to RocksDB.
   3. Retry Insert: After resharding, re-attempt the insert operation, which 
will now target the newly created empty shard.

  ## 4.3. Exists Operation (CF.EXISTS)

   1. Read Metadata: Read the metadata for the user_key.
   2. Iterate Through Shards:
       * Loop from shard_index = 0 to num_filters - 1.
       * For each shard, read its data from RocksDB.
       * Check if the item exists in the current shard.
       * If the item is found in any shard, return True immediately.
   3. Return Result: If the loop completes without finding the item, return 
False.

  ## 4.4. Delete Operation (CF.DELETE)

  The delete operation is similar to the exists operation, requiring a search 
through the shards.

   1. Read Metadata: Read the metadata for the user_key.
   2. Iterate Through Shards:
       * Loop from shard_index = 0 to num_filters - 1.
       * For each shard, read its data and attempt to delete the item.
       * If the item is successfully deleted, update the size in the metadata, 
write the updated metadata and shard data back to RocksDB, and return True.
   3. Return Result: If the loop completes without deleting the item, return 
False.

# 5. Concurrency Safety

   * Atomic Writes: All write operations (ADD, DELETE, EXPAND) must be atomic 
to ensure data consistency. This can be achieved using RocksDB's WriteBatch.
   * Locking: To prevent race conditions, a lock should be acquired for the 
duration of the read-modify-write cycle, especially when updating metadata. 
This ensures that concurrent operations do not interfere with each other.
# Reference

1. https://redis.io/docs/latest/develop/data-types/probabilistic/cuckoo-filter/
2. https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf

GitHub link: https://github.com/apache/kvrocks/discussions/3079

----
This is an automatically sent email for issues@kvrocks.apache.org.
To unsubscribe, please send an email to: issues-unsubscr...@kvrocks.apache.org

Reply via email to