Yifan Cai created CASSANALYTICS-167:
---------------------------------------

             Summary: Regenerate Bloom filters for CQLSSTableWriter-produced 
SSTables before upload
                 Key: CASSANALYTICS-167
                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-167
             Project: Apache Cassandra Analytics
          Issue Type: Improvement
          Components: Writer
            Reporter: Yifan Cai


CQLSSTableWriter produces empty Filter.db files when flushing SSTables. This 
causes Cassandra nodes to skip Bloom filter checks on imported SSTables, 
resulting in unnecessary disk reads for every partition key lookup.
Fixing CQLSSTableWriter upstream requires a new Cassandra release. As a 
near-term fix, cassandra-analytics will regenerate correct Bloom filters from 
the SSTable's Index.db before uploading.

Proposed changes:                                                               
                                                                                
                                                                                
                
- Add rebuildBloomFilter method to CassandraBridge interface with 
implementations in FourZeroBridge and FiveZeroBridge
- Call the rebuild in SortedSSTableWriter.close() after the SSTable flush and 
before digest computation, so digests cover the correct filter   

Jon Haddad reported the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to