[ 
https://issues.apache.org/jira/browse/CASSANDRA-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387002#comment-16387002
 ] 

Kurt Greaves commented on CASSANDRA-11163:
------------------------------------------

Correct. As I noted previously 

bq. Only regenerate and persist the bloomfilter when it's missing - not when it 
has changed. This means we rely on compactions/upgradesstables to update the 
bloomfilter.
bq. There's definitely no reason to regenerate Summaries in this case, and as 
previously mentioned it's not great regenerating the bloomfilter unless you're 
going to persist it. I have added persistence for the bloomfilter (when it is 
regenerated), however I think it's a bad idea to do this on startup as it will 
likely be more time consuming than regenerating the summaries.

So the previous behaviour was to regenerate the BF in this case but *not* 
persist it on the next startup (this meant it would happen on every startup 
until compactions/upgrades had occured). The summaries would be regenerated and 
persisted on the next startup (pointlessly). Both of these things would slow 
startup time pretty significantly depending on how much data you had.

The new behaviour would be to avoid regenerating BF/Summaries at all on startup 
and instead rely on upgradesstables/compactions to update them. Summaries would 
only be recreated when necessary (when not loaded/corrupt/missing).

In trunk it might make sense to also add a nodetool command that will allow us 
to regenerate the bloomfilters/summaries/etc without re-writing the whole data 
file.

> Summaries are needlessly rebuilt when the BF FP ratio is changed
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-11163
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11163
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Kurt Greaves
>            Priority: Major
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> This is from trunk, but I also saw this happen on 2.0:
> Before:
> {noformat}
> root@bw-1:/srv/cassandra# ls -ltr 
> /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/
> total 221460
> drwxr-xr-x 2 root root      4096 Feb 11 23:34 backups
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-6-big-TOC.txt
> -rw-r--r-- 1 root root     26518 Feb 11 23:50 ma-6-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-6-big-Statistics.db
> -rw-r--r-- 1 root root   2607705 Feb 11 23:50 ma-6-big-Index.db
> -rw-r--r-- 1 root root    192440 Feb 11 23:50 ma-6-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 11 23:50 ma-6-big-Digest.crc32
> -rw-r--r-- 1 root root  35212125 Feb 11 23:50 ma-6-big-Data.db
> -rw-r--r-- 1 root root      2156 Feb 11 23:50 ma-6-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-7-big-TOC.txt
> -rw-r--r-- 1 root root     26518 Feb 11 23:50 ma-7-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-7-big-Statistics.db
> -rw-r--r-- 1 root root   2607614 Feb 11 23:50 ma-7-big-Index.db
> -rw-r--r-- 1 root root    192432 Feb 11 23:50 ma-7-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-7-big-Digest.crc32
> -rw-r--r-- 1 root root  35190400 Feb 11 23:50 ma-7-big-Data.db
> -rw-r--r-- 1 root root      2152 Feb 11 23:50 ma-7-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-5-big-TOC.txt
> -rw-r--r-- 1 root root    104178 Feb 11 23:50 ma-5-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-5-big-Statistics.db
> -rw-r--r-- 1 root root  10289077 Feb 11 23:50 ma-5-big-Index.db
> -rw-r--r-- 1 root root    757384 Feb 11 23:50 ma-5-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-5-big-Digest.crc32
> -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db
> -rw-r--r-- 1 root root      8508 Feb 11 23:50 ma-5-big-CRC.db
> root@bw-1:/srv/cassandra# md5sum 
> /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db
> 5fca154fc790f7cfa37e8ad6d1c7552c
> {noformat}
> BF ratio changed, node restarted:
> {noformat}
> root@bw-1:/srv/cassandra# ls -ltr 
> /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/
> total 242168
> drwxr-xr-x 2 root root      4096 Feb 11 23:34 backups
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-6-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-6-big-Statistics.db
> -rw-r--r-- 1 root root   2607705 Feb 11 23:50 ma-6-big-Index.db
> -rw-r--r-- 1 root root    192440 Feb 11 23:50 ma-6-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 11 23:50 ma-6-big-Digest.crc32
> -rw-r--r-- 1 root root  35212125 Feb 11 23:50 ma-6-big-Data.db
> -rw-r--r-- 1 root root      2156 Feb 11 23:50 ma-6-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-7-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-7-big-Statistics.db
> -rw-r--r-- 1 root root   2607614 Feb 11 23:50 ma-7-big-Index.db
> -rw-r--r-- 1 root root    192432 Feb 11 23:50 ma-7-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-7-big-Digest.crc32
> -rw-r--r-- 1 root root  35190400 Feb 11 23:50 ma-7-big-Data.db
> -rw-r--r-- 1 root root      2152 Feb 11 23:50 ma-7-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-5-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-5-big-Statistics.db
> -rw-r--r-- 1 root root  10289077 Feb 11 23:50 ma-5-big-Index.db
> -rw-r--r-- 1 root root    757384 Feb 11 23:50 ma-5-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-5-big-Digest.crc32
> -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db
> -rw-r--r-- 1 root root      8508 Feb 11 23:50 ma-5-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 12 00:03 ma-8-big-TOC.txt
> -rw-r--r-- 1 root root     14902 Feb 12 00:03 ma-8-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 12 00:03 ma-8-big-Statistics.db
> -rw-r--r-- 1 root root   1458631 Feb 12 00:03 ma-8-big-Index.db
> -rw-r--r-- 1 root root     10808 Feb 12 00:03 ma-8-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 12 00:03 ma-8-big-Digest.crc32
> -rw-r--r-- 1 root root  19660275 Feb 12 00:03 ma-8-big-Data.db
> -rw-r--r-- 1 root root      1204 Feb 12 00:03 ma-8-big-CRC.db
> -rw-r--r-- 1 root root     26518 Feb 12 00:04 ma-7-big-Summary.db
> -rw-r--r-- 1 root root     26518 Feb 12 00:04 ma-6-big-Summary.db
> -rw-r--r-- 1 root root    104178 Feb 12 00:04 ma-5-big-Summary.db
> root@bw-1:/srv/cassandra# md5sum 
> /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db
>  
> 5fca154fc790f7cfa37e8ad6d1c7552c 
> {noformat}
> This hurts startup time and appears to do nothing useful whatsoever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to