[
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870143#comment-17870143
]
Stefan Miklosovic edited comment on CASSANDRA-18111 at 8/1/24 9:50 AM:
-----------------------------------------------------------------------
[~paulo] Thanks for the review!
I know there are some bits and pieces which are still there in
ColumnFamilyStore but I left them there because of backward compatibility.
These methods are exposed in Mbeans so I just left them there. I think that the
idea around the introduction of SnapshotManagerMbean makes sense and I dont
have any problem with it but it would mean that the methods in StorageService
etc would need to say and it would just call SnapshshotManager where all the
stuff would be put with additional logic left in StorageService etc ... We can
then deprecate the methods in StorageService and similar, sure.
for B) yes, there is a dependency here but I don't find it problematic too
much. I moved the core snapshotting method into SnapshotManager too. That
method is thread safe (or, at least, I have not identified any place where it
is not) so moving it to SnapshotManager should be safe and we unload some logic
from CFS as well.
Additional comment on Keyspace-level snapshot atomicity
Not sure about this, maybe example would be helpful here? We can indeed iterate
on this work in the future.
Snapshots with multiple data directories
I think that what the patch does makes a lot of sense. I consider not having a
manifest in each data dir quite an issue, border-line equaling that with a bug
because if we lose the directory where the manifest is located then we just
dont know anything about that snapshot anymore ... I do not know why we have
just one manifest per whole logical snapshot and it is not done per data dir
already. I would keep manifest-per-data-dir in place.
To elaborate on this "watching" feature more ... You know what ... what if we
just didn't do anything at all? I consider periodic scanning of snapshots by
some thread, e.g. every 5 minutes or so, inferior to just not doing anything at
all. The reason is that manual removal of a snapshot is not something an
operator would do regularly. This stuff is most probably just done by mistake
or similar. I also do not think that listing of snapshots (in connection with
manual removal of a snapshot dir(s)) is so frequently-done operation that it
justifies to run a background thread. When we do that, then if snapshots are
listed every 3 days (for example), then the amount of unnecessary scanning of
the disk if snapshots still exist is actually worse than us doing nothing. We
would cumulatively end up with more disk IO which beats the whole purpose of
doing this in-memory caching.
On the other hand, I consider it important that we are able to "reload" all
snapshots / refresh them. That indeed goes to the disk but it is an operation
an operator consciously chose to do instead of checking that regularly by us.
Your last sentence ... if we do that, then if a file is removed from one of the
directories, how would we detect that it is missing if it can be in a different
directory? We could not validate "snapshot per data dir", we would need to
validate the whole snapshot with all its data dirs at once. If this is
acceptable then we can include all files in one manifest PLUS the manifest
would be in each data dir.
was (Author: smiklosovic):
[~paulo] Thanks for the review!
I know there are some bits and pieces which are still there in
ColumnFamilyStore but I left them there because of backward compatibility.
These methods are exposed in Mbeans so I just left them there. I think that the
idea around the introduction of SnapshotManagerMbean makes sense and I dont
have any problem with it but it would mean that the methods in StorageService
etc would need to say and it would just call SnapshshotManager where all the
stuff would be put with additional logic left in StorageService etc ... We can
then deprecate the methods in StorageService and similar, sure.
for B) yes, there is a dependency here but I don't find it problematic too
much. I moved the core snapshotting method into SnapshotManager too. That
method is thread safe (or, at least, I have not identified any place where it
is not) so moving it to SnapshotManager should be safe and we unload some logic
from CFS as well.
Additional comment on Keyspace-level snapshot atomicity
Not sure about this, maybe example would be helpful here? We can indeed iterate
on this work in the future.
Snapshots with multiple data directories
I think that what the patch does makes a lot of sense. I consider not having a
manifest in each data dir quite an issue, border-line equaling that with a bug
because if we lose the directory where the manifest is located then we just
dont know anything about that snapshot anymore ... I do not know why we have
just one manifest per whole logical snapshot and it is not done per data dir
already. I would keep manifest-per-data-dir in place.
To elaborate on this "watching" feature more ... You know what ... what if we
just didn't do anything at all? I consider periodic scanning of snapshots by
some thread, e.g. every 5 minutes or so, inferior to just not doing anything at
all. The reason is that manual removal of a snapshot is not something an
operator would do regularly. This stuff is most probably just done by mistake
or similar. I also do not think that listing of snapshots is so frequently-done
operation that it justifies to run a background thread. When we do that, then
if snapshots are listed every 3 days (for example), then the amount of
unnecessary scanning of the disk if snapshots still exist is actually worse
than us doing nothing. We would cumulatively end up with more disk IO which
beats the whole purpose of doing this in-memory caching.
On the other hand, I consider it important that we are able to "reload" all
snapshots / refresh them. That indeed goes to the disk but it is an operation
an operator consciously chose to do instead of checking that regularly by us.
Your last sentence ... if we do that, then if a file is removed from one of the
directories, how would we detect that it is missing if it can be in a different
directory? We could not validate "snapshot per data dir", we would need to
validate the whole snapshot with all its data dirs at once. If this is
acceptable then we can include all files in one manifest PLUS the manifest
would be in each data dir.
> Centralize all snapshot operations to SnapshotManager and cache snapshots
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-18111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Snapshots
> Reporter: Paulo Motta
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 4h 50m
> Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are
> scanned to find snapshots, what is inefficient.
> For example, fetching the
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]