[
https://issues.apache.org/jira/browse/CASSANALYTICS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085715#comment-18085715
]
Lukasz Antoniak commented on CASSANALYTICS-167:
-----------------------------------------------
CI link:
https://app.circleci.com/pipelines/github/lukasz-antoniak/cassandra-analytics/278/workflows/8ef36034-5eb0-4980-9d15-2afe52cee7db
!image-2026-06-03-09-19-03-559.png|width=1032,height=442!
> Regenerate Bloom filters for CQLSSTableWriter-produced SSTables before upload
> -----------------------------------------------------------------------------
>
> Key: CASSANALYTICS-167
> URL: https://issues.apache.org/jira/browse/CASSANALYTICS-167
> Project: Apache Cassandra Analytics
> Issue Type: Improvement
> Components: Writer
> Reporter: Yifan Cai
> Assignee: Lukasz Antoniak
> Priority: Normal
> Attachments: image-2026-06-03-09-19-03-559.png
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> CQLSSTableWriter produces empty Filter.db files when flushing SSTables. This
> causes Cassandra nodes to skip Bloom filter checks on imported SSTables,
> resulting in unnecessary disk reads for every partition key lookup.
> Fixing CQLSSTableWriter upstream requires a new Cassandra release. As a
> near-term fix, cassandra-analytics will regenerate correct Bloom filters from
> the SSTable's Index.db before uploading.
> Proposed changes:
>
>
>
> - Add rebuildBloomFilter method to CassandraBridge interface with
> implementations in FourZeroBridge and FiveZeroBridge
> - Call the rebuild in SortedSSTableWriter.close() after the SSTable flush and
> before digest computation, so digests cover the correct filter
> Jon Haddad reported the issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]