[ 
https://issues.apache.org/jira/browse/CASSANALYTICS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085715#comment-18085715
 ] 

Lukasz Antoniak commented on CASSANALYTICS-167:
-----------------------------------------------

CI link: 
https://app.circleci.com/pipelines/github/lukasz-antoniak/cassandra-analytics/278/workflows/8ef36034-5eb0-4980-9d15-2afe52cee7db

!image-2026-06-03-09-19-03-559.png|width=1032,height=442!

> Regenerate Bloom filters for CQLSSTableWriter-produced SSTables before upload
> -----------------------------------------------------------------------------
>
>                 Key: CASSANALYTICS-167
>                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-167
>             Project: Apache Cassandra Analytics
>          Issue Type: Improvement
>          Components: Writer
>            Reporter: Yifan Cai
>            Assignee: Lukasz Antoniak
>            Priority: Normal
>         Attachments: image-2026-06-03-09-19-03-559.png
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> CQLSSTableWriter produces empty Filter.db files when flushing SSTables. This 
> causes Cassandra nodes to skip Bloom filter checks on imported SSTables, 
> resulting in unnecessary disk reads for every partition key lookup.
> Fixing CQLSSTableWriter upstream requires a new Cassandra release. As a 
> near-term fix, cassandra-analytics will regenerate correct Bloom filters from 
> the SSTable's Index.db before uploading.
> Proposed changes:                                                             
>                                                                               
>                                                                               
>                       
> - Add rebuildBloomFilter method to CassandraBridge interface with 
> implementations in FourZeroBridge and FiveZeroBridge
> - Call the rebuild in SortedSSTableWriter.close() after the SSTable flush and 
> before digest computation, so digests cover the correct filter   
> Jon Haddad reported the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to