[ https://issues.apache.org/jira/browse/CASSANDRA-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387002#comment-16387002 ]
Kurt Greaves commented on CASSANDRA-11163: ------------------------------------------ Correct. As I noted previously bq. Only regenerate and persist the bloomfilter when it's missing - not when it has changed. This means we rely on compactions/upgradesstables to update the bloomfilter. bq. There's definitely no reason to regenerate Summaries in this case, and as previously mentioned it's not great regenerating the bloomfilter unless you're going to persist it. I have added persistence for the bloomfilter (when it is regenerated), however I think it's a bad idea to do this on startup as it will likely be more time consuming than regenerating the summaries. So the previous behaviour was to regenerate the BF in this case but *not* persist it on the next startup (this meant it would happen on every startup until compactions/upgrades had occured). The summaries would be regenerated and persisted on the next startup (pointlessly). Both of these things would slow startup time pretty significantly depending on how much data you had. The new behaviour would be to avoid regenerating BF/Summaries at all on startup and instead rely on upgradesstables/compactions to update them. Summaries would only be recreated when necessary (when not loaded/corrupt/missing). In trunk it might make sense to also add a nodetool command that will allow us to regenerate the bloomfilters/summaries/etc without re-writing the whole data file. > Summaries are needlessly rebuilt when the BF FP ratio is changed > ---------------------------------------------------------------- > > Key: CASSANDRA-11163 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11163 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Brandon Williams > Assignee: Kurt Greaves > Priority: Major > Fix For: 3.0.x, 3.11.x, 4.x > > > This is from trunk, but I also saw this happen on 2.0: > Before: > {noformat} > root@bw-1:/srv/cassandra# ls -ltr > /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ > total 221460 > drwxr-xr-x 2 root root 4096 Feb 11 23:34 backups > -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-6-big-TOC.txt > -rw-r--r-- 1 root root 26518 Feb 11 23:50 ma-6-big-Summary.db > -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-6-big-Statistics.db > -rw-r--r-- 1 root root 2607705 Feb 11 23:50 ma-6-big-Index.db > -rw-r--r-- 1 root root 192440 Feb 11 23:50 ma-6-big-Filter.db > -rw-r--r-- 1 root root 10 Feb 11 23:50 ma-6-big-Digest.crc32 > -rw-r--r-- 1 root root 35212125 Feb 11 23:50 ma-6-big-Data.db > -rw-r--r-- 1 root root 2156 Feb 11 23:50 ma-6-big-CRC.db > -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-7-big-TOC.txt > -rw-r--r-- 1 root root 26518 Feb 11 23:50 ma-7-big-Summary.db > -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-7-big-Statistics.db > -rw-r--r-- 1 root root 2607614 Feb 11 23:50 ma-7-big-Index.db > -rw-r--r-- 1 root root 192432 Feb 11 23:50 ma-7-big-Filter.db > -rw-r--r-- 1 root root 9 Feb 11 23:50 ma-7-big-Digest.crc32 > -rw-r--r-- 1 root root 35190400 Feb 11 23:50 ma-7-big-Data.db > -rw-r--r-- 1 root root 2152 Feb 11 23:50 ma-7-big-CRC.db > -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-5-big-TOC.txt > -rw-r--r-- 1 root root 104178 Feb 11 23:50 ma-5-big-Summary.db > -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-5-big-Statistics.db > -rw-r--r-- 1 root root 10289077 Feb 11 23:50 ma-5-big-Index.db > -rw-r--r-- 1 root root 757384 Feb 11 23:50 ma-5-big-Filter.db > -rw-r--r-- 1 root root 9 Feb 11 23:50 ma-5-big-Digest.crc32 > -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db > -rw-r--r-- 1 root root 8508 Feb 11 23:50 ma-5-big-CRC.db > root@bw-1:/srv/cassandra# md5sum > /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db > 5fca154fc790f7cfa37e8ad6d1c7552c > {noformat} > BF ratio changed, node restarted: > {noformat} > root@bw-1:/srv/cassandra# ls -ltr > /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ > total 242168 > drwxr-xr-x 2 root root 4096 Feb 11 23:34 backups > -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-6-big-TOC.txt > -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-6-big-Statistics.db > -rw-r--r-- 1 root root 2607705 Feb 11 23:50 ma-6-big-Index.db > -rw-r--r-- 1 root root 192440 Feb 11 23:50 ma-6-big-Filter.db > -rw-r--r-- 1 root root 10 Feb 11 23:50 ma-6-big-Digest.crc32 > -rw-r--r-- 1 root root 35212125 Feb 11 23:50 ma-6-big-Data.db > -rw-r--r-- 1 root root 2156 Feb 11 23:50 ma-6-big-CRC.db > -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-7-big-TOC.txt > -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-7-big-Statistics.db > -rw-r--r-- 1 root root 2607614 Feb 11 23:50 ma-7-big-Index.db > -rw-r--r-- 1 root root 192432 Feb 11 23:50 ma-7-big-Filter.db > -rw-r--r-- 1 root root 9 Feb 11 23:50 ma-7-big-Digest.crc32 > -rw-r--r-- 1 root root 35190400 Feb 11 23:50 ma-7-big-Data.db > -rw-r--r-- 1 root root 2152 Feb 11 23:50 ma-7-big-CRC.db > -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-5-big-TOC.txt > -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-5-big-Statistics.db > -rw-r--r-- 1 root root 10289077 Feb 11 23:50 ma-5-big-Index.db > -rw-r--r-- 1 root root 757384 Feb 11 23:50 ma-5-big-Filter.db > -rw-r--r-- 1 root root 9 Feb 11 23:50 ma-5-big-Digest.crc32 > -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db > -rw-r--r-- 1 root root 8508 Feb 11 23:50 ma-5-big-CRC.db > -rw-r--r-- 1 root root 80 Feb 12 00:03 ma-8-big-TOC.txt > -rw-r--r-- 1 root root 14902 Feb 12 00:03 ma-8-big-Summary.db > -rw-r--r-- 1 root root 10264 Feb 12 00:03 ma-8-big-Statistics.db > -rw-r--r-- 1 root root 1458631 Feb 12 00:03 ma-8-big-Index.db > -rw-r--r-- 1 root root 10808 Feb 12 00:03 ma-8-big-Filter.db > -rw-r--r-- 1 root root 10 Feb 12 00:03 ma-8-big-Digest.crc32 > -rw-r--r-- 1 root root 19660275 Feb 12 00:03 ma-8-big-Data.db > -rw-r--r-- 1 root root 1204 Feb 12 00:03 ma-8-big-CRC.db > -rw-r--r-- 1 root root 26518 Feb 12 00:04 ma-7-big-Summary.db > -rw-r--r-- 1 root root 26518 Feb 12 00:04 ma-6-big-Summary.db > -rw-r--r-- 1 root root 104178 Feb 12 00:04 ma-5-big-Summary.db > root@bw-1:/srv/cassandra# md5sum > /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db > > 5fca154fc790f7cfa37e8ad6d1c7552c > {noformat} > This hurts startup time and appears to do nothing useful whatsoever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org