GitHub user LiuQhahah added a comment to the discussion: Cuckoo Filter Design 
Proposal

This is an excellent and insightful suggestion. 
In response to your suggestion, I have carefully considered it. 

For those large-capacity cuckoo filters, if we take a one-to-one mapping 
between filters and the count of rehashing, the efficiency during subsequent 
query insertions and deletions will not be as high as when multiple rocksDB 
keys represent sub-filters.


This is a schematic diagram.
  
```

┌─────────────────────┬────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│ RocksDB Key         │ Value Description              │ Details / Calculation  
                                                     │
├─────────────────────┼────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ "my_large_cuckoo"   │ Serialized CuckooChainMetadata │ Contains: 
num_filters=3, num_buckets=2^19, expansion=2, bucket_size=2, etc. │
│ "my_large_cuckoo:0" │ Data block for Sub-filter 0    │ Size: 2^19 buckets * 2 
bytes/bucket = 2^20 bytes (1 MiB)                    │
│ "my_large_cuckoo:1" │ Data block for Sub-filter 1    │ Size: (2^19  2^1) 
buckets  2 bytes/bucket = 2^21 bytes (2 MiB)              │
│ "my_large_cuckoo:2" │ Data block for Sub-filter 2    │ Size: (2^19  2^2) 
buckets  2 bytes/bucket = 2^22 bytes (4 MiB)              │
└─────────────────────┴────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘
```

will convert into -> 


```

┌───────────────────────┬────────────────────────────────┬──────────────────────────────────────────────────────────────┐
│ RocksDB Key           │ Value Description              │ Details / 
Calculation                                        │
├───────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────────────────┤
│ "my_large_cuckoo"     │ Serialized CuckooChainMetadata │ Contains: 
num_filters=3, filter_chunks=[1, 2, 4], etc.       │
│                       │                                │                      
                                        │
│ "my_large_cuckoo:0"   │ Data block for Sub-filter 0    │ Size: 1 MiB. (Not 
split as it doesn't exceed the threshold). │
│                       │                                │                      
                                        │
│ "my_large_cuckoo:1:0" │ First half of Sub-filter 1     │ Size: 1 MiB. 
Contains buckets 0 to 2^19 - 1.                 │
│ "my_large_cuckoo:1:1" │ Second half of Sub-filter 1    │ Size: 1 MiB. 
Contains buckets 2^19 to 2^20 - 1.              │
│                       │                                │                      
                                        │
│ "my_large_cuckoo:2:0" │ First quarter of Sub-filter 2  │ Size: 1 MiB. 
Contains buckets 0 to 2^19 - 1.                 │
│ "my_large_cuckoo:2:1" │ Second quarter of Sub-filter 2 │ Size: 1 MiB. 
Contains buckets 2^19 to 2^20 - 1.              │
│ "my_large_cuckoo:2:2" │ Third quarter of Sub-filter 2  │ Size: 1 MiB. 
Contains buckets 2^20 to (2^20 + 2^19) - 1.     │
│ "my_large_cuckoo:2:3" │ Fourth quarter of Sub-filter 2 │ Size: 1 MiB. 
Contains buckets (2^20 + 2^19) to 2^21 - 1.     │
└───────────────────────┴────────────────────────────────┴──────────────────────────────────────────────────────────────┘

```




GitHub link: 
https://github.com/apache/kvrocks/discussions/3079#discussioncomment-13918799

----
This is an automatically sent email for issues@kvrocks.apache.org.
To unsubscribe, please send an email to: issues-unsubscr...@kvrocks.apache.org

Reply via email to