Yes, for example as reported here https://issues.apache.org/jira/browse/CASSANDRA-13338
People who are charting this in monitoring dashboards might also hit this. On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan <jeremiah.jor...@gmail.com> wrote: > If you have a lot of snapshots and have for example a metric monitoring > them and their sizes, if you don’t cache it, creating the metric can cause > performance degradation. We added the cache because we saw this happen to > databases more than once. > > > On Aug 7, 2024, at 7:54 AM, Josh McKenzie <jmcken...@apache.org> wrote: > > > > > >> > >> Snapshot metadata are currently stored in memory / they are cached so > we do not need to go to disk every single time we want to list them, the > more snapshots we have, the worse it is. > > Are we enumerating our snapshots somewhere on the hot path, or is this > performance concern misplaced? > > > >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote: > >> Snapshot metadata are currently stored in memory / they are cached so > we do not need to go to disk every single time we want to list them, the > more snapshots we have, the worse it is. > >> > >> When a snapshot is _manually_ removed from disk, not from nodetool > clearsnapshot, just by rm -rf on a respective snapshot directory, then such > snapshot will be still visible in nodetool listsnapshots. Manual removal of > a snapshot might be done e.g. by accident or by some "impatient" operator > who just goes to disk and removes it there instead of using nodetool or > respective JMX method. > >> > >> To improve UX here, what I came up with is that we might use Java's > WatchService where each snapshot dir would be registered. WatchService is > part of Java, it uses inotify subsystem which is what Linux kernel offers. > The result of doing it is that once a snapshot dir is registered to be > watched and when it is removed then we are notified about that via inotify > / WatchService so we can react on it and remove the in-memory > representation of that so it will not be visible in the output anymore. > >> > >> While this works, there are some questions / concerns > >> > >> 1) What do people think about inotify in general? I tested this on 10k > snapshots and it seems to work just fine, nevertheless there is in general > no strong guarantee that every single event will come through, there is > also a family of kernel parameters around this where more tuning can be > done etc. It is also questionable how this will behave on other systems > from Linux (Mac etc). While JRE running on different platforms also > implements this, I am not completely sure these implementations are > quality-wise the same as for Linux etc. There is a history of > not-so-quality implementations for other systems (events not coming through > on Macs etc) and while I think we are safe on Linux, I am not sure we want > to go with this elsewhere. > >> > >> 2) inotify brings more entropy into the codebase, it is another thing > we need to take care of etc (however, it is all concentrated in one class > and pretty much "isolated" from everything else) > >> > >> I made this feature optional and it is turned off by default so people > need to explicitly opt-in into this so we are not forcing it on anybody. > >> > >> If we do not want to go with inotify, another option would be to have a > background thread which would periodically check if a manifest exists on a > disk, if it does not, then a snapshot does not either. While this works, > what I do not like about this is that the primary reason we moved it to > memory was to bypass IO as much as possible yet here we would introduce > another check which would go to disk, and this would be done periodically, > which beats the whole purpose. If an operator lists snapshots once a week > and there is a background check running every 10 minutes (for example), > then the cummulative number of IO operations migth be bigger than us just > doing nothing at all. For this reason, if we do not want to go with > inotify, I would also not implement any automatic background check. Instead > of that, there would be SnapshotManagerMbean#refresh() method which would > just forcibly reload all snapshots from scratch. I think that manual > deletion of snapshots is not something a user would do regularly, snapshots > are meant to be removed via nodetool or JMX. If manual removal ever happens > then in order to make it synchronized again, the refreshing of these > snapshots would be required. There might be an additional flag in nodetool > listsnapshots, once specified, refreshing would be done, otherwise not. > >> > >> How does this all sound to people? > >> > >> Regards > > >