[
https://issues.apache.org/jira/browse/CASSANALYTICS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yifan Cai updated CASSANALYTICS-167:
------------------------------------
Change Category: Performance
Complexity: Normal
Status: Open (was: Triage Needed)
> Regenerate Bloom filters for CQLSSTableWriter-produced SSTables before upload
> -----------------------------------------------------------------------------
>
> Key: CASSANALYTICS-167
> URL: https://issues.apache.org/jira/browse/CASSANALYTICS-167
> Project: Apache Cassandra Analytics
> Issue Type: Improvement
> Components: Writer
> Reporter: Yifan Cai
> Priority: Normal
>
> CQLSSTableWriter produces empty Filter.db files when flushing SSTables. This
> causes Cassandra nodes to skip Bloom filter checks on imported SSTables,
> resulting in unnecessary disk reads for every partition key lookup.
> Fixing CQLSSTableWriter upstream requires a new Cassandra release. As a
> near-term fix, cassandra-analytics will regenerate correct Bloom filters from
> the SSTable's Index.db before uploading.
> Proposed changes:
>
>
>
> - Add rebuildBloomFilter method to CassandraBridge interface with
> implementations in FourZeroBridge and FiveZeroBridge
> - Call the rebuild in SortedSSTableWriter.close() after the SSTable flush and
> before digest computation, so digests cover the correct filter
> Jon Haddad reported the issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]