Re: [DISCUSS] inotify for detection of manually removed snapshots

Štefan Miklošovič Fri, 09 Aug 2024 03:18:00 -0700

We could indeed do that. Does your suggestion mean that there should not be
a problem with caching it all once explicitly stated like that?


On Fri, Aug 9, 2024 at 12:01 PM Bowen Song via dev <[email protected]>
wrote:

> Has anyone considered simply updating the documentation saying this?
>
> "Removing the snapshot files directly from the filesystem may break
> things. Always use the `nodetool` command or JMX to remove snapshots."
> On 09/08/2024 09:18, Štefan Miklošovič wrote:
>
> If we consider caching it all to be too much, we might probably make
> caching an option an admin would need to opt-in into? There might be a flag
> in cassandra.yaml, once enabled, it would be in memory, otherwise it would
> just load it as it was so people can decide if caching is enough for them
> or they want to have it as it was before (would be by default set to as it
> was). This puts additional complexity into SnapshotManager but it should be
> in general doable.
>
> Let me know what you think, I would really like to have this resolved,
> 18111 brings a lot of code cleanup and simplifies stuff a lot.
>
> On Wed, Aug 7, 2024 at 11:30 PM Josh McKenzie <[email protected]>
> wrote:
>
>> If you have a lot of snapshots and have for example a metric monitoring
>> them and their sizes, if you don’t cache it, creating the metric can cause
>> performance degradation. We added the cache because we saw this happen to
>> databases more than once.
>>
>> I mean, I believe you, I'm just surprised querying out metadata for files
>> and basic computation is leading to hundreds of ms pause times even on
>> systems with a lot of files. Aren't most / all of these values cached at
>> the filesystem layer so we're basically just tomato / tomahto caching
>> systems, either one we maintain or one the OS maintains?
>>
>> Or is there really just a count of files well outside what I'm thinking
>> here?
>>
>> Anyway, not trying to cause a ruckus and make needless noise, trying to
>> learn. ;)
>>
>>
>> On Wed, Aug 7, 2024, at 3:20 PM, Štefan Miklošovič wrote:
>>
>>
>>
>> On Wed, Aug 7, 2024 at 6:39 PM Yifan Cai <[email protected]> wrote:
>>
>> With WatcherService, when events are missed (which is to be expected),
>> you will still need to list the files. It seems to me that WatcherService
>> doesn't offer significant benefits in this case.
>>
>>
>> Yeah I think we leave it out eventually.
>>
>>
>> Regarding listing directory with a refresh flag, my concern is the
>> potential for abuse. End-users might/could always refresh before listing,
>> which could undermine the purpose of caching. Perhaps Jeremiah can provide
>> more insight on this.
>>
>>
>> Well, by default, it would not be refreshed every single time. You would
>> need to opt-in into that. If there is a shop which has users with a direct
>> access to the disk of Cassandra nodes and they are removing data manually,
>> I do not know what to say, what is nodetool clearsnapshot and jmx methods
>> good for then? I do not think we can prevent people from shooting into
>> their feet if they are absolutely willing to do that.
>>
>> If they want to refresh that every time, that would be equal to the
>> current behavior. It would be at most as "bad" as it is now.
>>
>>
>> IMO, caching is best handled internally. I have a few UX-related
>> questions:
>> - Is it valid or acceptable to return stale data? If so, end-users have
>> to do some form of validation before consuming each snapshot to account for
>> potential deletions.
>>
>>
>> answer below
>>
>> - Even if listsnapshot returns the most recent data, is it possible that
>> some of the directories get deleted when end-users are accessing them? I
>> think it is true. It, then, enforces end-users to do some validation first,
>> similar to handling stale data.
>>
>>
>> I think that what you were trying to say is that when at time T0 somebody
>> lists snapshots and at T1 somebody removes a snapshot manually then the
>> list of snapshots is not actual anymore? Sure. That is a thing. This is how
>> it currently works.
>>
>> Now, we want to cache them, so if you clear a snapshot which is not
>> physically there because somebody removed it manually, that should be a
>> no-op, it will be just removed from the internal tracker. So, if it is at
>> disk and in cache and you clear it, then all is fine. It is fine too if it
>> is not on disk anymore and you clear it, then it is just removed
>> internally. It would fail only in case you want to remove a snapshot which
>> is not cached, regardless whether it is on disk or not. The deletion of
>> non-existing snapshot ends up with a failure, nothing should be changed in
>> that regard, this is the current behavior too.
>>
>> I want to say that I did not write it completely correctly at the very
>> beginning of this thread. Currently, we are caching only _expiring_
>> snapshots, because we need to know what is their time of removal so we act
>> on it later. _normal_ snapshots are _not_ cached _yet_. I spent so much
>> time with 18111 that I live in a reality where it is already in, I forgot
>> this is not actually in place yet, we are very close to that.
>>
>> OK thank you all for your insights, I will NOT use inotify.
>>
>>
>> Just my 2 cents.
>>
>> - Yifan
>>
>> On Wed, Aug 7, 2024 at 6:03 AM Štefan Miklošovič <[email protected]>
>> wrote:
>>
>> Yes, for example as reported here
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-13338
>>
>> People who are charting this in monitoring dashboards might also hit this.
>>
>> On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan <[email protected]>
>> wrote:
>>
>> If you have a lot of snapshots and have for example a metric monitoring
>> them and their sizes, if you don’t cache it, creating the metric can cause
>> performance degradation. We added the cache because we saw this happen to
>> databases more than once.
>>
>> > On Aug 7, 2024, at 7:54 AM, Josh McKenzie <[email protected]> wrote:
>> >
>> > 
>> >>
>> >> Snapshot metadata are currently stored in memory / they are cached so
>> we do not need to go to disk every single time we want to list them, the
>> more snapshots we have, the worse it is.
>> > Are we enumerating our snapshots somewhere on the hot path, or is this
>> performance concern misplaced?
>> >
>> >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
>> >> Snapshot metadata are currently stored in memory / they are cached so
>> we do not need to go to disk every single time we want to list them, the
>> more snapshots we have, the worse it is.
>> >>
>> >> When a snapshot is _manually_ removed from disk, not from nodetool
>> clearsnapshot, just by rm -rf on a respective snapshot directory, then such
>> snapshot will be still visible in nodetool listsnapshots. Manual removal of
>> a snapshot might be done e.g. by accident or by some "impatient" operator
>> who just goes to disk and removes it there instead of using nodetool or
>> respective JMX method.
>> >>
>> >> To improve UX here, what I came up with is that we might use Java's
>> WatchService where each snapshot dir would be registered. WatchService is
>> part of Java, it uses inotify subsystem which is what Linux kernel offers.
>> The result of doing it is that once a snapshot dir is registered to be
>> watched and when it is removed then we are notified about that via inotify
>> / WatchService so we can react on it and remove the in-memory
>> representation of that so it will not be visible in the output anymore.
>> >>
>> >> While this works, there are some questions / concerns
>> >>
>> >> 1) What do people think about inotify in general? I tested this on 10k
>> snapshots and it seems to work just fine, nevertheless there is in general
>> no strong guarantee that every single event will come through, there is
>> also a family of kernel parameters around this where more tuning can be
>> done etc. It is also questionable how this will behave on other systems
>> from Linux (Mac etc). While JRE running on different platforms also
>> implements this, I am not completely sure these implementations are
>> quality-wise the same as for Linux etc. There is a history of
>> not-so-quality implementations for other systems (events not coming through
>> on Macs etc) and while I think we are safe on Linux, I am not sure we want
>> to go with this elsewhere.
>> >>
>> >> 2) inotify brings more entropy into the codebase, it is another thing
>> we need to take care of etc (however, it is all concentrated in one class
>> and pretty much "isolated" from everything else)
>> >>
>> >> I made this feature optional and it is turned off by default so people
>> need to explicitly opt-in into this so we are not forcing it on anybody.
>> >>
>> >> If we do not want to go with inotify, another option would be to have
>> a background thread which would periodically check if a manifest exists on
>> a disk, if it does not, then a snapshot does not either. While this works,
>> what I do not like about this is that the primary reason we moved it to
>> memory was to bypass IO as much as possible yet here we would introduce
>> another check which would go to disk, and this would be done periodically,
>> which beats the whole purpose. If an operator lists snapshots once a week
>> and there is a background check running every 10 minutes (for example),
>> then the cummulative number of IO operations migth be bigger than us just
>> doing nothing at all. For this reason, if we do not want to go with
>> inotify, I would also not implement any automatic background check. Instead
>> of that, there would be SnapshotManagerMbean#refresh() method which would
>> just forcibly reload all snapshots from scratch. I think that manual
>> deletion of snapshots is not something a user would do regularly, snapshots
>> are meant to be removed via nodetool or JMX. If manual removal ever happens
>> then in order to make it synchronized again, the refreshing of these
>> snapshots would be required. There might be an additional flag in nodetool
>> listsnapshots, once specified, refreshing would be done, otherwise not.
>> >>
>> >> How does this all sound to people?
>> >>
>> >> Regards
>> >
>>
>>
>>

Re: [DISCUSS] inotify for detection of manually removed snapshots

Reply via email to