Re: [DISCUSS] inotify for detection of manually removed snapshots

Josh McKenzie Wed, 07 Aug 2024 14:28:52 -0700

> If you have a lot of snapshots and have for example a metric monitoring them 
> and their sizes, if you don’t cache it, creating the metric can cause 
> performance degradation. We added the cache because we saw this happen to 
> databases more than once.
I mean, I believe you, I'm just surprised querying out metadata for files and 
basic computation is leading to hundreds of ms pause times even on systems with 
a lot of files. Aren't most / all of these values cached at the filesystem 
layer so we're basically just tomato / tomahto caching systems, either one we 
maintain or one the OS maintains?


Or is there really just a count of files well outside what I'm thinking here?

Anyway, not trying to cause a ruckus and make needless noise, trying to learn. 
;)


On Wed, Aug 7, 2024, at 3:20 PM, Štefan Miklošovič wrote:
> 
> 
> On Wed, Aug 7, 2024 at 6:39 PM Yifan Cai <[email protected]> wrote:
>> With WatcherService, when events are missed (which is to be expected), you 
>> will still need to list the files. It seems to me that WatcherService 
>> doesn't offer significant benefits in this case.
>  
> Yeah I think we leave it out eventually.
> 
>> 
>> Regarding listing directory with a refresh flag, my concern is the potential 
>> for abuse. End-users might/could always refresh before listing, which could 
>> undermine the purpose of caching. Perhaps Jeremiah can provide more insight 
>> on this.
> 
> Well, by default, it would not be refreshed every single time. You would need 
> to opt-in into that. If there is a shop which has users with a direct access 
> to the disk of Cassandra nodes and they are removing data manually, I do not 
> know what to say, what is nodetool clearsnapshot and jmx methods good for 
> then? I do not think we can prevent people from shooting into their feet if 
> they are absolutely willing to do that.
> 
> If they want to refresh that every time, that would be equal to the current 
> behavior. It would be at most as "bad" as it is now.
>  
>> IMO, caching is best handled internally. I have a few UX-related questions:
>> - Is it valid or acceptable to return stale data? If so, end-users have to 
>> do some form of validation before consuming each snapshot to account for 
>> potential deletions. 
> 
> answer below
> 
>> - Even if listsnapshot returns the most recent data, is it possible that 
>> some of the directories get deleted when end-users are accessing them? I 
>> think it is true. It, then, enforces end-users to do some validation first, 
>> similar to handling stale data. 
> 
> I think that what you were trying to say is that when at time T0 somebody 
> lists snapshots and at T1 somebody removes a snapshot manually then the list 
> of snapshots is not actual anymore? Sure. That is a thing. This is how it 
> currently works. 
> 
> Now, we want to cache them, so if you clear a snapshot which is not 
> physically there because somebody removed it manually, that should be a 
> no-op, it will be just removed from the internal tracker. So, if it is at 
> disk and in cache and you clear it, then all is fine. It is fine too if it is 
> not on disk anymore and you clear it, then it is just removed internally. It 
> would fail only in case you want to remove a snapshot which is not cached, 
> regardless whether it is on disk or not. The deletion of non-existing 
> snapshot ends up with a failure, nothing should be changed in that regard, 
> this is the current behavior too.
> 
> I want to say that I did not write it completely correctly at the very 
> beginning of this thread. Currently, we are caching only _expiring_ 
> snapshots, because we need to know what is their time of removal so we act on 
> it later. _normal_ snapshots are _not_ cached _yet_. I spent so much time 
> with 18111 that I live in a reality where it is already in, I forgot this is 
> not actually in place yet, we are very close to that.
>  
> OK thank you all for your insights, I will NOT use inotify.
> 
>> 
>> Just my 2 cents. 
>> 
>> - Yifan
>> 
>> On Wed, Aug 7, 2024 at 6:03 AM Štefan Miklošovič <[email protected]> 
>> wrote:
>>> Yes, for example as reported here
>>> 
>>> https://issues.apache.org/jira/browse/CASSANDRA-13338
>>> 
>>> People who are charting this in monitoring dashboards might also hit this.
>>> 
>>> On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan <[email protected]> 
>>> wrote:
>>>> If you have a lot of snapshots and have for example a metric monitoring 
>>>> them and their sizes, if you don’t cache it, creating the metric can cause 
>>>> performance degradation. We added the cache because we saw this happen to 
>>>> databases more than once.
>>>> 
>>>> > On Aug 7, 2024, at 7:54 AM, Josh McKenzie <[email protected]> wrote:
>>>> > 
>>>> > 
>>>> >> 
>>>> >> Snapshot metadata are currently stored in memory / they are cached so 
>>>> >> we do not need to go to disk every single time we want to list them, 
>>>> >> the more snapshots we have, the worse it is.
>>>> > Are we enumerating our snapshots somewhere on the hot path, or is this 
>>>> > performance concern misplaced?
>>>> > 
>>>> >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
>>>> >> Snapshot metadata are currently stored in memory / they are cached so 
>>>> >> we do not need to go to disk every single time we want to list them, 
>>>> >> the more snapshots we have, the worse it is.
>>>> >> 
>>>> >> When a snapshot is _manually_ removed from disk, not from nodetool 
>>>> >> clearsnapshot, just by rm -rf on a respective snapshot directory, then 
>>>> >> such snapshot will be still visible in nodetool listsnapshots. Manual 
>>>> >> removal of a snapshot might be done e.g. by accident or by some 
>>>> >> "impatient" operator who just goes to disk and removes it there instead 
>>>> >> of using nodetool or respective JMX method.
>>>> >> 
>>>> >> To improve UX here, what I came up with is that we might use Java's 
>>>> >> WatchService where each snapshot dir would be registered. WatchService 
>>>> >> is part of Java, it uses inotify subsystem which is what Linux kernel 
>>>> >> offers. The result of doing it is that once a snapshot dir is 
>>>> >> registered to be watched and when it is removed then we are notified 
>>>> >> about that via inotify / WatchService so we can react on it and remove 
>>>> >> the in-memory representation of that so it will not be visible in the 
>>>> >> output anymore.
>>>> >> 
>>>> >> While this works, there are some questions / concerns
>>>> >> 
>>>> >> 1) What do people think about inotify in general? I tested this on 10k 
>>>> >> snapshots and it seems to work just fine, nevertheless there is in 
>>>> >> general no strong guarantee that every single event will come through, 
>>>> >> there is also a family of kernel parameters around this where more 
>>>> >> tuning can be done etc. It is also questionable how this will behave on 
>>>> >> other systems from Linux (Mac etc). While JRE running on different 
>>>> >> platforms also implements this, I am not completely sure these 
>>>> >> implementations are quality-wise the same as for Linux etc. There is a 
>>>> >> history of not-so-quality implementations for other systems (events not 
>>>> >> coming through on Macs etc) and while I think we are safe on Linux, I 
>>>> >> am not sure we want to go with this elsewhere.
>>>> >> 
>>>> >> 2) inotify brings more entropy into the codebase, it is another thing 
>>>> >> we need to take care of etc (however, it is all concentrated in one 
>>>> >> class and pretty much "isolated" from everything else)
>>>> >> 
>>>> >> I made this feature optional and it is turned off by default so people 
>>>> >> need to explicitly opt-in into this so we are not forcing it on anybody.
>>>> >> 
>>>> >> If we do not want to go with inotify, another option would be to have a 
>>>> >> background thread which would periodically check if a manifest exists 
>>>> >> on a disk, if it does not, then a snapshot does not either. While this 
>>>> >> works, what I do not like about this is that the primary reason we 
>>>> >> moved it to memory was to bypass IO as much as possible yet here we 
>>>> >> would introduce another check which would go to disk, and this would be 
>>>> >> done periodically, which beats the whole purpose. If an operator lists 
>>>> >> snapshots once a week and there is a background check running every 10 
>>>> >> minutes (for example), then the cummulative number of IO operations 
>>>> >> migth be bigger than us just doing nothing at all. For this reason, if 
>>>> >> we do not want to go with inotify, I would also not implement any 
>>>> >> automatic background check. Instead of that, there would be 
>>>> >> SnapshotManagerMbean#refresh() method which would just forcibly reload 
>>>> >> all snapshots from scratch. I think that manual deletion of snapshots 
>>>> >> is not something a user would do regularly, snapshots are meant to be 
>>>> >> removed via nodetool or JMX. If manual removal ever happens then in 
>>>> >> order to make it synchronized again, the refreshing of these snapshots 
>>>> >> would be required. There might be an additional flag in nodetool 
>>>> >> listsnapshots, once specified, refreshing would be done, otherwise not.
>>>> >> 
>>>> >> How does this all sound to people?
>>>> >> 
>>>> >> Regards
>>>> >

Re: [DISCUSS] inotify for detection of manually removed snapshots

Reply via email to