[ 
https://issues.apache.org/jira/browse/CASSANALYTICS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANALYTICS-167:
------------------------------------
    Change Category: Performance
         Complexity: Normal
             Status: Open  (was: Triage Needed)

> Regenerate Bloom filters for CQLSSTableWriter-produced SSTables before upload
> -----------------------------------------------------------------------------
>
>                 Key: CASSANALYTICS-167
>                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-167
>             Project: Apache Cassandra Analytics
>          Issue Type: Improvement
>          Components: Writer
>            Reporter: Yifan Cai
>            Priority: Normal
>
> CQLSSTableWriter produces empty Filter.db files when flushing SSTables. This 
> causes Cassandra nodes to skip Bloom filter checks on imported SSTables, 
> resulting in unnecessary disk reads for every partition key lookup.
> Fixing CQLSSTableWriter upstream requires a new Cassandra release. As a 
> near-term fix, cassandra-analytics will regenerate correct Bloom filters from 
> the SSTable's Index.db before uploading.
> Proposed changes:                                                             
>                                                                               
>                                                                               
>                       
> - Add rebuildBloomFilter method to CassandraBridge interface with 
> implementations in FourZeroBridge and FiveZeroBridge
> - Call the rebuild in SortedSSTableWriter.close() after the SSTable flush and 
> before digest computation, so digests cover the correct filter   
> Jon Haddad reported the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to