Re: [DISCUSS] inotify for detection of manually removed snapshots

Josh McKenzie Wed, 07 Aug 2024 05:54:33 -0700

> Snapshot metadata are currently stored in memory / they are cached so we do 
> not need to go to disk every single time we want to list them, the more 
> snapshots we have, the worse it is.
Are we enumerating our snapshots somewhere on the hot path, or is this 
performance concern misplaced?


On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
> Snapshot metadata are currently stored in memory / they are cached so we do 
> not need to go to disk every single time we want to list them, the more 
> snapshots we have, the worse it is.
> 
> When a snapshot is _manually_ removed from disk, not from nodetool 
> clearsnapshot, just by rm -rf on a respective snapshot directory, then such 
> snapshot will be still visible in nodetool listsnapshots. Manual removal of a 
> snapshot might be done e.g. by accident or by some "impatient" operator who 
> just goes to disk and removes it there instead of using nodetool or 
> respective JMX method.
> 
> To improve UX here, what I came up with is that we might use Java's 
> WatchService where each snapshot dir would be registered. WatchService is 
> part of Java, it uses inotify subsystem which is what Linux kernel offers. 
> The result of doing it is that once a snapshot dir is registered to be 
> watched and when it is removed then we are notified about that via inotify / 
> WatchService so we can react on it and remove the in-memory representation of 
> that so it will not be visible in the output anymore.
> 
> While this works, there are some questions / concerns
> 
> 1) What do people think about inotify in general? I tested this on 10k 
> snapshots and it seems to work just fine, nevertheless there is in general no 
> strong guarantee that every single event will come through, there is also a 
> family of kernel parameters around this where more tuning can be done etc. It 
> is also questionable how this will behave on other systems from Linux (Mac 
> etc). While JRE running on different platforms also implements this, I am not 
> completely sure these implementations are quality-wise the same as for Linux 
> etc. There is a history of not-so-quality implementations for other systems 
> (events not coming through on Macs etc) and while I think we are safe on 
> Linux, I am not sure we want to go with this elsewhere.
> 
> 2) inotify brings more entropy into the codebase, it is another thing we need 
> to take care of etc (however, it is all concentrated in one class and pretty 
> much "isolated" from everything else)
> 
> I made this feature optional and it is turned off by default so people need 
> to explicitly opt-in into this so we are not forcing it on anybody.
> 
> If we do not want to go with inotify, another option would be to have a 
> background thread which would periodically check if a manifest exists on a 
> disk, if it does not, then a snapshot does not either. While this works, what 
> I do not like about this is that the primary reason we moved it to memory was 
> to bypass IO as much as possible yet here we would introduce another check 
> which would go to disk, and this would be done periodically, which beats the 
> whole purpose. If an operator lists snapshots once a week and there is a 
> background check running every 10 minutes (for example), then the cummulative 
> number of IO operations migth be bigger than us just doing nothing at all. 
> For this reason, if we do not want to go with inotify, I would also not 
> implement any automatic background check. Instead of that, there would be 
> SnapshotManagerMbean#refresh() method which would just forcibly reload all 
> snapshots from scratch. I think that manual deletion of snapshots is not 
> something a user would do regularly, snapshots are meant to be removed via 
> nodetool or JMX. If manual removal ever happens then in order to make it 
> synchronized again, the refreshing of these snapshots would be required. 
> There might be an additional flag in nodetool listsnapshots, once specified, 
> refreshing would be done, otherwise not.
> 
> How does this all sound to people?
> 
> Regards

Re: [DISCUSS] inotify for detection of manually removed snapshots

Reply via email to