[jira] [Updated] (CASSANDRA-18123) Reuse of metadata collector can break key count calculation

Branimir Lambov (Jira) Fri, 16 Dec 2022 06:25:55 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Branimir Lambov updated CASSANDRA-18123:
----------------------------------------
    Since Version: 3.0.0

> Reuse of metadata collector can break key count calculation
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-18123
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18123
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Branimir Lambov
>            Priority: Normal
>
> When flushing a memtable we currently pass a constructed 
> {{MetadataCollector}} to the {{SSTableMultiWriter}} that is used for writing 
> sstables. The latter may decide to split the data into multiple sstables 
> (e.g. for separate disks or driven by compaction strategy) — if it does so, 
> the cardinality estimation component in the reused {{MetadataCollector}} for 
> each individual sstable contains the data for all of them.
> As a result, when such sstables are compacted the estimation for the number 
> of keys in the resulting sstables, which is used to determine the size of the 
> bloom filter for the compaction result, is heavily overestimated.
> This results in much bigger L1 bloom filters than they should be. One example 
> (which came about during testing of the upcoming CEP-26, after insertion of 
> 100GB data with 10% reads):
> (current)
> {code}
>               Bloom filter false positives: 22627369
>               Bloom filter false ratio: 0.02257
>               Bloom filter space used: 1848247864
>               Bloom filter off heap memory used: 2338964088
> {code}
> (fixed)
> {code}
>               Bloom filter false positives: 24426545
>               Bloom filter false ratio: 0.02429
>               Bloom filter space used: 1118910096
>               Bloom filter off heap memory used: 1532357432
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-18123) Reuse of metadata collector can break key count calculation

Reply via email to