Re: [DISCUSS] inotify for detection of manually removed snapshots

Štefan Miklošovič Wed, 07 Aug 2024 06:03:00 -0700

Yes, for example as reported here

https://issues.apache.org/jira/browse/CASSANDRA-13338


People who are charting this in monitoring dashboards might also hit this.

On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan <[email protected]>
wrote:

> If you have a lot of snapshots and have for example a metric monitoring
> them and their sizes, if you don’t cache it, creating the metric can cause
> performance degradation. We added the cache because we saw this happen to
> databases more than once.
>
> > On Aug 7, 2024, at 7:54 AM, Josh McKenzie <[email protected]> wrote:
> >
> > 
> >>
> >> Snapshot metadata are currently stored in memory / they are cached so
> we do not need to go to disk every single time we want to list them, the
> more snapshots we have, the worse it is.
> > Are we enumerating our snapshots somewhere on the hot path, or is this
> performance concern misplaced?
> >
> >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
> >> Snapshot metadata are currently stored in memory / they are cached so
> we do not need to go to disk every single time we want to list them, the
> more snapshots we have, the worse it is.
> >>
> >> When a snapshot is _manually_ removed from disk, not from nodetool
> clearsnapshot, just by rm -rf on a respective snapshot directory, then such
> snapshot will be still visible in nodetool listsnapshots. Manual removal of
> a snapshot might be done e.g. by accident or by some "impatient" operator
> who just goes to disk and removes it there instead of using nodetool or
> respective JMX method.
> >>
> >> To improve UX here, what I came up with is that we might use Java's
> WatchService where each snapshot dir would be registered. WatchService is
> part of Java, it uses inotify subsystem which is what Linux kernel offers.
> The result of doing it is that once a snapshot dir is registered to be
> watched and when it is removed then we are notified about that via inotify
> / WatchService so we can react on it and remove the in-memory
> representation of that so it will not be visible in the output anymore.
> >>
> >> While this works, there are some questions / concerns
> >>
> >> 1) What do people think about inotify in general? I tested this on 10k
> snapshots and it seems to work just fine, nevertheless there is in general
> no strong guarantee that every single event will come through, there is
> also a family of kernel parameters around this where more tuning can be
> done etc. It is also questionable how this will behave on other systems
> from Linux (Mac etc). While JRE running on different platforms also
> implements this, I am not completely sure these implementations are
> quality-wise the same as for Linux etc. There is a history of
> not-so-quality implementations for other systems (events not coming through
> on Macs etc) and while I think we are safe on Linux, I am not sure we want
> to go with this elsewhere.
> >>
> >> 2) inotify brings more entropy into the codebase, it is another thing
> we need to take care of etc (however, it is all concentrated in one class
> and pretty much "isolated" from everything else)
> >>
> >> I made this feature optional and it is turned off by default so people
> need to explicitly opt-in into this so we are not forcing it on anybody.
> >>
> >> If we do not want to go with inotify, another option would be to have a
> background thread which would periodically check if a manifest exists on a
> disk, if it does not, then a snapshot does not either. While this works,
> what I do not like about this is that the primary reason we moved it to
> memory was to bypass IO as much as possible yet here we would introduce
> another check which would go to disk, and this would be done periodically,
> which beats the whole purpose. If an operator lists snapshots once a week
> and there is a background check running every 10 minutes (for example),
> then the cummulative number of IO operations migth be bigger than us just
> doing nothing at all. For this reason, if we do not want to go with
> inotify, I would also not implement any automatic background check. Instead
> of that, there would be SnapshotManagerMbean#refresh() method which would
> just forcibly reload all snapshots from scratch. I think that manual
> deletion of snapshots is not something a user would do regularly, snapshots
> are meant to be removed via nodetool or JMX. If manual removal ever happens
> then in order to make it synchronized again, the refreshing of these
> snapshots would be required. There might be an additional flag in nodetool
> listsnapshots, once specified, refreshing would be done, otherwise not.
> >>
> >> How does this all sound to people?
> >>
> >> Regards
> >
>

Re: [DISCUSS] inotify for detection of manually removed snapshots

Reply via email to