[
https://issues.apache.org/jira/browse/CASSANDRA-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371081#comment-16371081
]
Chris Lohfink edited comment on CASSANDRA-11163 at 2/21/18 8:26 AM:
--------------------------------------------------------------------
* In {{load(ValidationMetadata validation, boolean isOffline)}} everywhere your
calling {{load( bool , true )}} you can instead call \{{ load( bool,
!isOffline) }} since you never want to save the summary in those other
situations either. This will break your test but IMHO thats checking that the
wrong case occurs. If the summary file is not there, it should not create it.
Tools and such may be running with a different user, if someone runs this on a
data directory and this occurs it will create a file that C* would be unable to
delete, causing compaction threads to die and backup etc. I think, in offline
mode the tools should _never_ delete, touch or create unnecessary files,
especially the summary/bf files since they are mostly there to speed up startup
and not necessary for the reader to work anyway. You can also make the
"recreateBloomFilter" always false in offline mode (whenever its true, instead
put !isOffline) since it will then just use whats there. With one exception of
where the FILTER component is missing, where you can just put AlwaysPresent bf
and skip so that code that uses it doesn't NPE.
* In unit tests, is the 1000ms sleep necessary? the lastModified is in ms so I
thought it may be ok to set lower
* Just checking it out and running it over and over, the unit tests fails
occasionally (rarely) (line 407 check {{assertNotEquals(bloomModified,
bloomFile.lastModified());}} is the same)
* NP: I think you can reuse the last option (track hotness) since its only
false currently in situations where we dont want or need to recreate currently.
If rename it to like "allowChanges". That way we are not adding additional
booleans to end of that load function.
was (Author: cnlwsu):
* In {{load(ValidationMetadata validation, boolean isOffline)}} everywhere your
calling {{load( bool , true )}} you can instead call \{{ load( bool,
!isOffline) }} since you never want to save the summary in those other
situations either. This will break your test but IMHO thats checking that the
wrong case occurs. If the summary file is not there, it should not create it.
Tools and such may be running with a different user, if someone runs this on a
data directory and this occurs it will create a file that C* would be unable to
delete, causing compaction threads to die and backup etc. I think, in offline
mode the tools should _never_ delete, touch or create unnecessary files,
especially the summary/bf files since they are mostly there to speed up startup
and not necessary for the reader to work anyway. You can also make the
"recreateBloomFilter" always false in offline mode (whenever its true, instead
put !isOffline) since it will then just use whats there. With one exception of
where the FILTER component is missing, where you can just put AlwaysPresent bf
and skip so that code that uses it doesn't NPE.
* In unit tests, is the 1000ms sleep necessary? the lastModified is in ms I
thought so it may be ok to set lower
* Just checking it out and running it over and over, the unit tests fails
occasionally (rarely) (line 407 check {{assertNotEquals(bloomModified,
bloomFile.lastModified());}} is the same)
* NP: I think you can reuse the last option (track hotness) since its only
false currently in situations where we dont want or need to recreate currently.
If rename it to like "allowChanges". That way we are not adding additional
booleans to end of that load function.
> Summaries are needlessly rebuilt when the BF FP ratio is changed
> ----------------------------------------------------------------
>
> Key: CASSANDRA-11163
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11163
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Kurt Greaves
> Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> This is from trunk, but I also saw this happen on 2.0:
> Before:
> {noformat}
> root@bw-1:/srv/cassandra# ls -ltr
> /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/
> total 221460
> drwxr-xr-x 2 root root 4096 Feb 11 23:34 backups
> -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-6-big-TOC.txt
> -rw-r--r-- 1 root root 26518 Feb 11 23:50 ma-6-big-Summary.db
> -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-6-big-Statistics.db
> -rw-r--r-- 1 root root 2607705 Feb 11 23:50 ma-6-big-Index.db
> -rw-r--r-- 1 root root 192440 Feb 11 23:50 ma-6-big-Filter.db
> -rw-r--r-- 1 root root 10 Feb 11 23:50 ma-6-big-Digest.crc32
> -rw-r--r-- 1 root root 35212125 Feb 11 23:50 ma-6-big-Data.db
> -rw-r--r-- 1 root root 2156 Feb 11 23:50 ma-6-big-CRC.db
> -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-7-big-TOC.txt
> -rw-r--r-- 1 root root 26518 Feb 11 23:50 ma-7-big-Summary.db
> -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-7-big-Statistics.db
> -rw-r--r-- 1 root root 2607614 Feb 11 23:50 ma-7-big-Index.db
> -rw-r--r-- 1 root root 192432 Feb 11 23:50 ma-7-big-Filter.db
> -rw-r--r-- 1 root root 9 Feb 11 23:50 ma-7-big-Digest.crc32
> -rw-r--r-- 1 root root 35190400 Feb 11 23:50 ma-7-big-Data.db
> -rw-r--r-- 1 root root 2152 Feb 11 23:50 ma-7-big-CRC.db
> -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-5-big-TOC.txt
> -rw-r--r-- 1 root root 104178 Feb 11 23:50 ma-5-big-Summary.db
> -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-5-big-Statistics.db
> -rw-r--r-- 1 root root 10289077 Feb 11 23:50 ma-5-big-Index.db
> -rw-r--r-- 1 root root 757384 Feb 11 23:50 ma-5-big-Filter.db
> -rw-r--r-- 1 root root 9 Feb 11 23:50 ma-5-big-Digest.crc32
> -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db
> -rw-r--r-- 1 root root 8508 Feb 11 23:50 ma-5-big-CRC.db
> root@bw-1:/srv/cassandra# md5sum
> /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db
> 5fca154fc790f7cfa37e8ad6d1c7552c
> {noformat}
> BF ratio changed, node restarted:
> {noformat}
> root@bw-1:/srv/cassandra# ls -ltr
> /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/
> total 242168
> drwxr-xr-x 2 root root 4096 Feb 11 23:34 backups
> -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-6-big-TOC.txt
> -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-6-big-Statistics.db
> -rw-r--r-- 1 root root 2607705 Feb 11 23:50 ma-6-big-Index.db
> -rw-r--r-- 1 root root 192440 Feb 11 23:50 ma-6-big-Filter.db
> -rw-r--r-- 1 root root 10 Feb 11 23:50 ma-6-big-Digest.crc32
> -rw-r--r-- 1 root root 35212125 Feb 11 23:50 ma-6-big-Data.db
> -rw-r--r-- 1 root root 2156 Feb 11 23:50 ma-6-big-CRC.db
> -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-7-big-TOC.txt
> -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-7-big-Statistics.db
> -rw-r--r-- 1 root root 2607614 Feb 11 23:50 ma-7-big-Index.db
> -rw-r--r-- 1 root root 192432 Feb 11 23:50 ma-7-big-Filter.db
> -rw-r--r-- 1 root root 9 Feb 11 23:50 ma-7-big-Digest.crc32
> -rw-r--r-- 1 root root 35190400 Feb 11 23:50 ma-7-big-Data.db
> -rw-r--r-- 1 root root 2152 Feb 11 23:50 ma-7-big-CRC.db
> -rw-r--r-- 1 root root 80 Feb 11 23:50 ma-5-big-TOC.txt
> -rw-r--r-- 1 root root 10264 Feb 11 23:50 ma-5-big-Statistics.db
> -rw-r--r-- 1 root root 10289077 Feb 11 23:50 ma-5-big-Index.db
> -rw-r--r-- 1 root root 757384 Feb 11 23:50 ma-5-big-Filter.db
> -rw-r--r-- 1 root root 9 Feb 11 23:50 ma-5-big-Digest.crc32
> -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db
> -rw-r--r-- 1 root root 8508 Feb 11 23:50 ma-5-big-CRC.db
> -rw-r--r-- 1 root root 80 Feb 12 00:03 ma-8-big-TOC.txt
> -rw-r--r-- 1 root root 14902 Feb 12 00:03 ma-8-big-Summary.db
> -rw-r--r-- 1 root root 10264 Feb 12 00:03 ma-8-big-Statistics.db
> -rw-r--r-- 1 root root 1458631 Feb 12 00:03 ma-8-big-Index.db
> -rw-r--r-- 1 root root 10808 Feb 12 00:03 ma-8-big-Filter.db
> -rw-r--r-- 1 root root 10 Feb 12 00:03 ma-8-big-Digest.crc32
> -rw-r--r-- 1 root root 19660275 Feb 12 00:03 ma-8-big-Data.db
> -rw-r--r-- 1 root root 1204 Feb 12 00:03 ma-8-big-CRC.db
> -rw-r--r-- 1 root root 26518 Feb 12 00:04 ma-7-big-Summary.db
> -rw-r--r-- 1 root root 26518 Feb 12 00:04 ma-6-big-Summary.db
> -rw-r--r-- 1 root root 104178 Feb 12 00:04 ma-5-big-Summary.db
> root@bw-1:/srv/cassandra# md5sum
> /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db
>
> 5fca154fc790f7cfa37e8ad6d1c7552c
> {noformat}
> This hurts startup time and appears to do nothing useful whatsoever.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]