GitHub user LiuQhahah added a comment to the discussion: Cuckoo Filter Design Proposal
This is an excellent and insightful suggestion. In response to your suggestion, I have carefully considered it. For those large-capacity cuckoo filters, if we take a one-to-one mapping between filters and the count of rehashing, the efficiency during subsequent query insertions and deletions will not be as high as when multiple rocksDB keys represent sub-filters. This is a schematic diagram. ``` ┌─────────────────────┬────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐ │ RocksDB Key │ Value Description │ Details / Calculation │ ├─────────────────────┼────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ "my_large_cuckoo" │ Serialized CuckooChainMetadata │ Contains: num_filters=3, num_buckets=2^19, expansion=2, bucket_size=2, etc. │ │ "my_large_cuckoo:0" │ Data block for Sub-filter 0 │ Size: 2^19 buckets * 2 bytes/bucket = 2^20 bytes (1 MiB) │ │ "my_large_cuckoo:1" │ Data block for Sub-filter 1 │ Size: (2^19 2^1) buckets 2 bytes/bucket = 2^21 bytes (2 MiB) │ │ "my_large_cuckoo:2" │ Data block for Sub-filter 2 │ Size: (2^19 2^2) buckets 2 bytes/bucket = 2^22 bytes (4 MiB) │ └─────────────────────┴────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘ ``` will convert into -> ``` ┌───────────────────────┬────────────────────────────────┬──────────────────────────────────────────────────────────────┐ │ RocksDB Key │ Value Description │ Details / Calculation │ ├───────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────────────────┤ │ "my_large_cuckoo" │ Serialized CuckooChainMetadata │ Contains: num_filters=3, filter_chunks=[1, 2, 4], etc. │ │ │ │ │ │ "my_large_cuckoo:0" │ Data block for Sub-filter 0 │ Size: 1 MiB. (Not split as it doesn't exceed the threshold). │ │ │ │ │ │ "my_large_cuckoo:1:0" │ First half of Sub-filter 1 │ Size: 1 MiB. Contains buckets 0 to 2^19 - 1. │ │ "my_large_cuckoo:1:1" │ Second half of Sub-filter 1 │ Size: 1 MiB. Contains buckets 2^19 to 2^20 - 1. │ │ │ │ │ │ "my_large_cuckoo:2:0" │ First quarter of Sub-filter 2 │ Size: 1 MiB. Contains buckets 0 to 2^19 - 1. │ │ "my_large_cuckoo:2:1" │ Second quarter of Sub-filter 2 │ Size: 1 MiB. Contains buckets 2^19 to 2^20 - 1. │ │ "my_large_cuckoo:2:2" │ Third quarter of Sub-filter 2 │ Size: 1 MiB. Contains buckets 2^20 to (2^20 + 2^19) - 1. │ │ "my_large_cuckoo:2:3" │ Fourth quarter of Sub-filter 2 │ Size: 1 MiB. Contains buckets (2^20 + 2^19) to 2^21 - 1. │ └───────────────────────┴────────────────────────────────┴──────────────────────────────────────────────────────────────┘ ``` GitHub link: https://github.com/apache/kvrocks/discussions/3079#discussioncomment-13918799 ---- This is an automatically sent email for issues@kvrocks.apache.org. To unsubscribe, please send an email to: issues-unsubscr...@kvrocks.apache.org