I lean towards the documentation approach vs complicating the implementation.
For me personally: I regularly use shell commands to operate on snapshots. That includes listing them. I probably should use nodetool for it all instead though. Jordan On Fri, Aug 9, 2024 at 08:09 Štefan Miklošovič <smikloso...@apache.org> wrote: > I understand and agree. It is just that it would be cool if we avoided the > situation when there is a figurative ABC company which has these "bash > scripts removing snapshots from cron by rm -rf every second Sunday at 3:00 > am" because "that was their workflow for ages". > > I am particularly sensitive to this as Cassandra is very cautious when it > comes to not disrupting the workflows already out there. > > I do not know how frequent this would be and if somebody started to > complain. I mean ... they could still remove it by hand, right? It is just > listsnapshots would not be relevant anymore without refreshing it. I think > that might be acceptable. It would be something else if we flat out made > manual deletion forbidden, which it is not. > > On Fri, Aug 9, 2024 at 4:50 PM Bowen Song via dev < > dev@cassandra.apache.org> wrote: > >> If we have the documentation in place, we can then consider the cache to >> be the master copy of metadata, and rely on it to be always accurate and up >> to date. If someone deletes the snapshot files from filesystem, they can't >> complain about Cassandra stopped working correctly - which is the same if >> they had manually deleted some SSTable files (they shouldn't). >> On 09/08/2024 11:16, Štefan Miklošovič wrote: >> >> We could indeed do that. Does your suggestion mean that there should not >> be a problem with caching it all once explicitly stated like that? >> >> On Fri, Aug 9, 2024 at 12:01 PM Bowen Song via dev < >> dev@cassandra.apache.org> wrote: >> >>> Has anyone considered simply updating the documentation saying this? >>> >>> "Removing the snapshot files directly from the filesystem may break >>> things. Always use the `nodetool` command or JMX to remove snapshots." >>> On 09/08/2024 09:18, Štefan Miklošovič wrote: >>> >>> If we consider caching it all to be too much, we might probably make >>> caching an option an admin would need to opt-in into? There might be a flag >>> in cassandra.yaml, once enabled, it would be in memory, otherwise it would >>> just load it as it was so people can decide if caching is enough for them >>> or they want to have it as it was before (would be by default set to as it >>> was). This puts additional complexity into SnapshotManager but it should be >>> in general doable. >>> >>> Let me know what you think, I would really like to have this resolved, >>> 18111 brings a lot of code cleanup and simplifies stuff a lot. >>> >>> On Wed, Aug 7, 2024 at 11:30 PM Josh McKenzie <jmcken...@apache.org> >>> wrote: >>> >>>> If you have a lot of snapshots and have for example a metric monitoring >>>> them and their sizes, if you don’t cache it, creating the metric can cause >>>> performance degradation. We added the cache because we saw this happen to >>>> databases more than once. >>>> >>>> I mean, I believe you, I'm just surprised querying out metadata for >>>> files and basic computation is leading to hundreds of ms pause times even >>>> on systems with a lot of files. Aren't most / all of these values cached at >>>> the filesystem layer so we're basically just tomato / tomahto caching >>>> systems, either one we maintain or one the OS maintains? >>>> >>>> Or is there really just a count of files well outside what I'm thinking >>>> here? >>>> >>>> Anyway, not trying to cause a ruckus and make needless noise, trying to >>>> learn. ;) >>>> >>>> >>>> On Wed, Aug 7, 2024, at 3:20 PM, Štefan Miklošovič wrote: >>>> >>>> >>>> >>>> On Wed, Aug 7, 2024 at 6:39 PM Yifan Cai <yc25c...@gmail.com> wrote: >>>> >>>> With WatcherService, when events are missed (which is to be expected), >>>> you will still need to list the files. It seems to me that WatcherService >>>> doesn't offer significant benefits in this case. >>>> >>>> >>>> Yeah I think we leave it out eventually. >>>> >>>> >>>> Regarding listing directory with a refresh flag, my concern is the >>>> potential for abuse. End-users might/could always refresh before listing, >>>> which could undermine the purpose of caching. Perhaps Jeremiah can provide >>>> more insight on this. >>>> >>>> >>>> Well, by default, it would not be refreshed every single time. You >>>> would need to opt-in into that. If there is a shop which has users with a >>>> direct access to the disk of Cassandra nodes and they are removing data >>>> manually, I do not know what to say, what is nodetool clearsnapshot and jmx >>>> methods good for then? I do not think we can prevent people from shooting >>>> into their feet if they are absolutely willing to do that. >>>> >>>> If they want to refresh that every time, that would be equal to the >>>> current behavior. It would be at most as "bad" as it is now. >>>> >>>> >>>> IMO, caching is best handled internally. I have a few UX-related >>>> questions: >>>> - Is it valid or acceptable to return stale data? If so, end-users have >>>> to do some form of validation before consuming each snapshot to account for >>>> potential deletions. >>>> >>>> >>>> answer below >>>> >>>> - Even if listsnapshot returns the most recent data, is it possible >>>> that some of the directories get deleted when end-users are accessing them? >>>> I think it is true. It, then, enforces end-users to do some validation >>>> first, similar to handling stale data. >>>> >>>> >>>> I think that what you were trying to say is that when at time T0 >>>> somebody lists snapshots and at T1 somebody removes a snapshot manually >>>> then the list of snapshots is not actual anymore? Sure. That is a thing. >>>> This is how it currently works. >>>> >>>> Now, we want to cache them, so if you clear a snapshot which is not >>>> physically there because somebody removed it manually, that should be a >>>> no-op, it will be just removed from the internal tracker. So, if it is at >>>> disk and in cache and you clear it, then all is fine. It is fine too if it >>>> is not on disk anymore and you clear it, then it is just removed >>>> internally. It would fail only in case you want to remove a snapshot which >>>> is not cached, regardless whether it is on disk or not. The deletion of >>>> non-existing snapshot ends up with a failure, nothing should be changed in >>>> that regard, this is the current behavior too. >>>> >>>> I want to say that I did not write it completely correctly at the very >>>> beginning of this thread. Currently, we are caching only _expiring_ >>>> snapshots, because we need to know what is their time of removal so we act >>>> on it later. _normal_ snapshots are _not_ cached _yet_. I spent so much >>>> time with 18111 that I live in a reality where it is already in, I forgot >>>> this is not actually in place yet, we are very close to that. >>>> >>>> OK thank you all for your insights, I will NOT use inotify. >>>> >>>> >>>> Just my 2 cents. >>>> >>>> - Yifan >>>> >>>> On Wed, Aug 7, 2024 at 6:03 AM Štefan Miklošovič < >>>> smikloso...@apache.org> wrote: >>>> >>>> Yes, for example as reported here >>>> >>>> https://issues.apache.org/jira/browse/CASSANDRA-13338 >>>> >>>> People who are charting this in monitoring dashboards might also hit >>>> this. >>>> >>>> On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan <jeremiah.jor...@gmail.com> >>>> wrote: >>>> >>>> If you have a lot of snapshots and have for example a metric monitoring >>>> them and their sizes, if you don’t cache it, creating the metric can cause >>>> performance degradation. We added the cache because we saw this happen to >>>> databases more than once. >>>> >>>> > On Aug 7, 2024, at 7:54 AM, Josh McKenzie <jmcken...@apache.org> >>>> wrote: >>>> > >>>> > >>>> >> >>>> >> Snapshot metadata are currently stored in memory / they are cached >>>> so we do not need to go to disk every single time we want to list them, the >>>> more snapshots we have, the worse it is. >>>> > Are we enumerating our snapshots somewhere on the hot path, or is >>>> this performance concern misplaced? >>>> > >>>> >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote: >>>> >> Snapshot metadata are currently stored in memory / they are cached >>>> so we do not need to go to disk every single time we want to list them, the >>>> more snapshots we have, the worse it is. >>>> >> >>>> >> When a snapshot is _manually_ removed from disk, not from nodetool >>>> clearsnapshot, just by rm -rf on a respective snapshot directory, then such >>>> snapshot will be still visible in nodetool listsnapshots. Manual removal of >>>> a snapshot might be done e.g. by accident or by some "impatient" operator >>>> who just goes to disk and removes it there instead of using nodetool or >>>> respective JMX method. >>>> >> >>>> >> To improve UX here, what I came up with is that we might use Java's >>>> WatchService where each snapshot dir would be registered. WatchService is >>>> part of Java, it uses inotify subsystem which is what Linux kernel offers. >>>> The result of doing it is that once a snapshot dir is registered to be >>>> watched and when it is removed then we are notified about that via inotify >>>> / WatchService so we can react on it and remove the in-memory >>>> representation of that so it will not be visible in the output anymore. >>>> >> >>>> >> While this works, there are some questions / concerns >>>> >> >>>> >> 1) What do people think about inotify in general? I tested this on >>>> 10k snapshots and it seems to work just fine, nevertheless there is in >>>> general no strong guarantee that every single event will come through, >>>> there is also a family of kernel parameters around this where more tuning >>>> can be done etc. It is also questionable how this will behave on other >>>> systems from Linux (Mac etc). While JRE running on different platforms also >>>> implements this, I am not completely sure these implementations are >>>> quality-wise the same as for Linux etc. There is a history of >>>> not-so-quality implementations for other systems (events not coming through >>>> on Macs etc) and while I think we are safe on Linux, I am not sure we want >>>> to go with this elsewhere. >>>> >> >>>> >> 2) inotify brings more entropy into the codebase, it is another >>>> thing we need to take care of etc (however, it is all concentrated in one >>>> class and pretty much "isolated" from everything else) >>>> >> >>>> >> I made this feature optional and it is turned off by default so >>>> people need to explicitly opt-in into this so we are not forcing it on >>>> anybody. >>>> >> >>>> >> If we do not want to go with inotify, another option would be to >>>> have a background thread which would periodically check if a manifest >>>> exists on a disk, if it does not, then a snapshot does not either. While >>>> this works, what I do not like about this is that the primary reason we >>>> moved it to memory was to bypass IO as much as possible yet here we would >>>> introduce another check which would go to disk, and this would be done >>>> periodically, which beats the whole purpose. If an operator lists snapshots >>>> once a week and there is a background check running every 10 minutes (for >>>> example), then the cummulative number of IO operations migth be bigger than >>>> us just doing nothing at all. For this reason, if we do not want to go with >>>> inotify, I would also not implement any automatic background check. Instead >>>> of that, there would be SnapshotManagerMbean#refresh() method which would >>>> just forcibly reload all snapshots from scratch. I think that manual >>>> deletion of snapshots is not something a user would do regularly, snapshots >>>> are meant to be removed via nodetool or JMX. If manual removal ever happens >>>> then in order to make it synchronized again, the refreshing of these >>>> snapshots would be required. There might be an additional flag in nodetool >>>> listsnapshots, once specified, refreshing would be done, otherwise not. >>>> >> >>>> >> How does this all sound to people? >>>> >> >>>> >> Regards >>>> > >>>> >>>> >>>>