[
https://issues.apache.org/jira/browse/CASSANDRA-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Williams updated CASSANDRA-18123:
-----------------------------------------
Bug Category: Parent values: Degradation(12984)Level 1 values: Performance
Bug/Regression(12997)
Complexity: Normal
Discovered By: User Report
Fix Version/s: 3.0.x
3.11.x
4.0.x
4.1.x
4.x
Severity: Normal
Status: Open (was: Triage Needed)
> Reuse of metadata collector can break key count calculation
> -----------------------------------------------------------
>
> Key: CASSANDRA-18123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18123
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Compaction
> Reporter: Branimir Lambov
> Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 4.x
>
>
> When flushing a memtable we currently pass a constructed
> {{MetadataCollector}} to the {{SSTableMultiWriter}} that is used for writing
> sstables. The latter may decide to split the data into multiple sstables
> (e.g. for separate disks or driven by compaction strategy) — if it does so,
> the cardinality estimation component in the reused {{MetadataCollector}} for
> each individual sstable contains the data for all of them.
> As a result, when such sstables are compacted the estimation for the number
> of keys in the resulting sstables, which is used to determine the size of the
> bloom filter for the compaction result, is heavily overestimated.
> This results in much bigger L1 bloom filters than they should be. One example
> (which came about during testing of the upcoming CEP-26, after insertion of
> 100GB data with 10% reads):
> (current)
> {code}
> Bloom filter false positives: 22627369
> Bloom filter false ratio: 0.02257
> Bloom filter space used: 1848247864
> Bloom filter off heap memory used: 2338964088
> {code}
> (fixed)
> {code}
> Bloom filter false positives: 24426545
> Bloom filter false ratio: 0.02429
> Bloom filter space used: 1118910096
> Bloom filter off heap memory used: 1532357432
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]