Yifan Cai created CASSANALYTICS-167:
---------------------------------------
Summary: Regenerate Bloom filters for CQLSSTableWriter-produced
SSTables before upload
Key: CASSANALYTICS-167
URL: https://issues.apache.org/jira/browse/CASSANALYTICS-167
Project: Apache Cassandra Analytics
Issue Type: Improvement
Components: Writer
Reporter: Yifan Cai
CQLSSTableWriter produces empty Filter.db files when flushing SSTables. This
causes Cassandra nodes to skip Bloom filter checks on imported SSTables,
resulting in unnecessary disk reads for every partition key lookup.
Fixing CQLSSTableWriter upstream requires a new Cassandra release. As a
near-term fix, cassandra-analytics will regenerate correct Bloom filters from
the SSTable's Index.db before uploading.
Proposed changes:
- Add rebuildBloomFilter method to CassandraBridge interface with
implementations in FourZeroBridge and FiveZeroBridge
- Call the rebuild in SortedSSTableWriter.close() after the SSTable flush and
before digest computation, so digests cover the correct filter
Jon Haddad reported the issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]