Re: [DISCUSS] inotify for detection of manually removed snapshots

Jordan West Fri, 09 Aug 2024 10:39:23 -0700

I lean towards the documentation approach vs complicating the
implementation.


For me personally: I regularly use shell commands to operate on snapshots.
That includes listing them. I probably should use nodetool for it all
instead though.

Jordan

On Fri, Aug 9, 2024 at 08:09 Štefan Miklošovič <[email protected]>
wrote:

> I understand and agree. It is just that it would be cool if we avoided the
> situation when there is a figurative ABC company which has these "bash
> scripts removing snapshots from cron by rm -rf every second Sunday at 3:00
> am" because "that was their workflow for ages".
>
> I am particularly sensitive to this as Cassandra is very cautious when it
> comes to not disrupting the workflows already out there.
>
> I do not know how frequent this would be and if somebody started to
> complain. I mean ... they could still remove it by hand, right? It is just
> listsnapshots would not be relevant anymore without refreshing it. I think
> that might be acceptable. It would be something else if we flat out made
> manual deletion forbidden, which it is not.
>
> On Fri, Aug 9, 2024 at 4:50 PM Bowen Song via dev <
> [email protected]> wrote:
>
>> If we have the documentation in place, we can then consider the cache to
>> be the master copy of metadata, and rely on it to be always accurate and up
>> to date. If someone deletes the snapshot files from filesystem, they can't
>> complain about Cassandra stopped working correctly - which is the same if
>> they had manually deleted some SSTable files (they shouldn't).
>> On 09/08/2024 11:16, Štefan Miklošovič wrote:
>>
>> We could indeed do that. Does your suggestion mean that there should not
>> be a problem with caching it all once explicitly stated like that?
>>
>> On Fri, Aug 9, 2024 at 12:01 PM Bowen Song via dev <
>> [email protected]> wrote:
>>
>>> Has anyone considered simply updating the documentation saying this?
>>>
>>> "Removing the snapshot files directly from the filesystem may break
>>> things. Always use the `nodetool` command or JMX to remove snapshots."
>>> On 09/08/2024 09:18, Štefan Miklošovič wrote:
>>>
>>> If we consider caching it all to be too much, we might probably make
>>> caching an option an admin would need to opt-in into? There might be a flag
>>> in cassandra.yaml, once enabled, it would be in memory, otherwise it would
>>> just load it as it was so people can decide if caching is enough for them
>>> or they want to have it as it was before (would be by default set to as it
>>> was). This puts additional complexity into SnapshotManager but it should be
>>> in general doable.
>>>
>>> Let me know what you think, I would really like to have this resolved,
>>> 18111 brings a lot of code cleanup and simplifies stuff a lot.
>>>
>>> On Wed, Aug 7, 2024 at 11:30 PM Josh McKenzie <[email protected]>
>>> wrote:
>>>
>>>> If you have a lot of snapshots and have for example a metric monitoring
>>>> them and their sizes, if you don’t cache it, creating the metric can cause
>>>> performance degradation. We added the cache because we saw this happen to
>>>> databases more than once.
>>>>
>>>> I mean, I believe you, I'm just surprised querying out metadata for
>>>> files and basic computation is leading to hundreds of ms pause times even
>>>> on systems with a lot of files. Aren't most / all of these values cached at
>>>> the filesystem layer so we're basically just tomato / tomahto caching
>>>> systems, either one we maintain or one the OS maintains?
>>>>
>>>> Or is there really just a count of files well outside what I'm thinking
>>>> here?
>>>>
>>>> Anyway, not trying to cause a ruckus and make needless noise, trying to
>>>> learn. ;)
>>>>
>>>>
>>>> On Wed, Aug 7, 2024, at 3:20 PM, Štefan Miklošovič wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Aug 7, 2024 at 6:39 PM Yifan Cai <[email protected]> wrote:
>>>>
>>>> With WatcherService, when events are missed (which is to be expected),
>>>> you will still need to list the files. It seems to me that WatcherService
>>>> doesn't offer significant benefits in this case.
>>>>
>>>>
>>>> Yeah I think we leave it out eventually.
>>>>
>>>>
>>>> Regarding listing directory with a refresh flag, my concern is the
>>>> potential for abuse. End-users might/could always refresh before listing,
>>>> which could undermine the purpose of caching. Perhaps Jeremiah can provide
>>>> more insight on this.
>>>>
>>>>
>>>> Well, by default, it would not be refreshed every single time. You
>>>> would need to opt-in into that. If there is a shop which has users with a
>>>> direct access to the disk of Cassandra nodes and they are removing data
>>>> manually, I do not know what to say, what is nodetool clearsnapshot and jmx
>>>> methods good for then? I do not think we can prevent people from shooting
>>>> into their feet if they are absolutely willing to do that.
>>>>
>>>> If they want to refresh that every time, that would be equal to the
>>>> current behavior. It would be at most as "bad" as it is now.
>>>>
>>>>
>>>> IMO, caching is best handled internally. I have a few UX-related
>>>> questions:
>>>> - Is it valid or acceptable to return stale data? If so, end-users have
>>>> to do some form of validation before consuming each snapshot to account for
>>>> potential deletions.
>>>>
>>>>
>>>> answer below
>>>>
>>>> - Even if listsnapshot returns the most recent data, is it possible
>>>> that some of the directories get deleted when end-users are accessing them?
>>>> I think it is true. It, then, enforces end-users to do some validation
>>>> first, similar to handling stale data.
>>>>
>>>>
>>>> I think that what you were trying to say is that when at time T0
>>>> somebody lists snapshots and at T1 somebody removes a snapshot manually
>>>> then the list of snapshots is not actual anymore? Sure. That is a thing.
>>>> This is how it currently works.
>>>>
>>>> Now, we want to cache them, so if you clear a snapshot which is not
>>>> physically there because somebody removed it manually, that should be a
>>>> no-op, it will be just removed from the internal tracker. So, if it is at
>>>> disk and in cache and you clear it, then all is fine. It is fine too if it
>>>> is not on disk anymore and you clear it, then it is just removed
>>>> internally. It would fail only in case you want to remove a snapshot which
>>>> is not cached, regardless whether it is on disk or not. The deletion of
>>>> non-existing snapshot ends up with a failure, nothing should be changed in
>>>> that regard, this is the current behavior too.
>>>>
>>>> I want to say that I did not write it completely correctly at the very
>>>> beginning of this thread. Currently, we are caching only _expiring_
>>>> snapshots, because we need to know what is their time of removal so we act
>>>> on it later. _normal_ snapshots are _not_ cached _yet_. I spent so much
>>>> time with 18111 that I live in a reality where it is already in, I forgot
>>>> this is not actually in place yet, we are very close to that.
>>>>
>>>> OK thank you all for your insights, I will NOT use inotify.
>>>>
>>>>
>>>> Just my 2 cents.
>>>>
>>>> - Yifan
>>>>
>>>> On Wed, Aug 7, 2024 at 6:03 AM Štefan Miklošovič <
>>>> [email protected]> wrote:
>>>>
>>>> Yes, for example as reported here
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-13338
>>>>
>>>> People who are charting this in monitoring dashboards might also hit
>>>> this.
>>>>
>>>> On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan <[email protected]>
>>>> wrote:
>>>>
>>>> If you have a lot of snapshots and have for example a metric monitoring
>>>> them and their sizes, if you don’t cache it, creating the metric can cause
>>>> performance degradation. We added the cache because we saw this happen to
>>>> databases more than once.
>>>>
>>>> > On Aug 7, 2024, at 7:54 AM, Josh McKenzie <[email protected]>
>>>> wrote:
>>>> >
>>>> > 
>>>> >>
>>>> >> Snapshot metadata are currently stored in memory / they are cached
>>>> so we do not need to go to disk every single time we want to list them, the
>>>> more snapshots we have, the worse it is.
>>>> > Are we enumerating our snapshots somewhere on the hot path, or is
>>>> this performance concern misplaced?
>>>> >
>>>> >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
>>>> >> Snapshot metadata are currently stored in memory / they are cached
>>>> so we do not need to go to disk every single time we want to list them, the
>>>> more snapshots we have, the worse it is.
>>>> >>
>>>> >> When a snapshot is _manually_ removed from disk, not from nodetool
>>>> clearsnapshot, just by rm -rf on a respective snapshot directory, then such
>>>> snapshot will be still visible in nodetool listsnapshots. Manual removal of
>>>> a snapshot might be done e.g. by accident or by some "impatient" operator
>>>> who just goes to disk and removes it there instead of using nodetool or
>>>> respective JMX method.
>>>> >>
>>>> >> To improve UX here, what I came up with is that we might use Java's
>>>> WatchService where each snapshot dir would be registered. WatchService is
>>>> part of Java, it uses inotify subsystem which is what Linux kernel offers.
>>>> The result of doing it is that once a snapshot dir is registered to be
>>>> watched and when it is removed then we are notified about that via inotify
>>>> / WatchService so we can react on it and remove the in-memory
>>>> representation of that so it will not be visible in the output anymore.
>>>> >>
>>>> >> While this works, there are some questions / concerns
>>>> >>
>>>> >> 1) What do people think about inotify in general? I tested this on
>>>> 10k snapshots and it seems to work just fine, nevertheless there is in
>>>> general no strong guarantee that every single event will come through,
>>>> there is also a family of kernel parameters around this where more tuning
>>>> can be done etc. It is also questionable how this will behave on other
>>>> systems from Linux (Mac etc). While JRE running on different platforms also
>>>> implements this, I am not completely sure these implementations are
>>>> quality-wise the same as for Linux etc. There is a history of
>>>> not-so-quality implementations for other systems (events not coming through
>>>> on Macs etc) and while I think we are safe on Linux, I am not sure we want
>>>> to go with this elsewhere.
>>>> >>
>>>> >> 2) inotify brings more entropy into the codebase, it is another
>>>> thing we need to take care of etc (however, it is all concentrated in one
>>>> class and pretty much "isolated" from everything else)
>>>> >>
>>>> >> I made this feature optional and it is turned off by default so
>>>> people need to explicitly opt-in into this so we are not forcing it on
>>>> anybody.
>>>> >>
>>>> >> If we do not want to go with inotify, another option would be to
>>>> have a background thread which would periodically check if a manifest
>>>> exists on a disk, if it does not, then a snapshot does not either. While
>>>> this works, what I do not like about this is that the primary reason we
>>>> moved it to memory was to bypass IO as much as possible yet here we would
>>>> introduce another check which would go to disk, and this would be done
>>>> periodically, which beats the whole purpose. If an operator lists snapshots
>>>> once a week and there is a background check running every 10 minutes (for
>>>> example), then the cummulative number of IO operations migth be bigger than
>>>> us just doing nothing at all. For this reason, if we do not want to go with
>>>> inotify, I would also not implement any automatic background check. Instead
>>>> of that, there would be SnapshotManagerMbean#refresh() method which would
>>>> just forcibly reload all snapshots from scratch. I think that manual
>>>> deletion of snapshots is not something a user would do regularly, snapshots
>>>> are meant to be removed via nodetool or JMX. If manual removal ever happens
>>>> then in order to make it synchronized again, the refreshing of these
>>>> snapshots would be required. There might be an additional flag in nodetool
>>>> listsnapshots, once specified, refreshing would be done, otherwise not.
>>>> >>
>>>> >> How does this all sound to people?
>>>> >>
>>>> >> Regards
>>>> >
>>>>
>>>>
>>>>

Re: [DISCUSS] inotify for detection of manually removed snapshots

Reply via email to