[
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870085#comment-17870085
]
Paulo Motta edited comment on CASSANDRA-18111 at 8/1/24 5:24 AM:
-----------------------------------------------------------------
(I structured this review into multiple sections to hopefully make it easier to
discuss)
I'd like to restate and discuss the goals of this ticket to ensure we're on the
same page:
* *✅ Goal 1: Improve performance of {color:#ff0000}+nodetool
listsnapshots+{color} / {color:#ff0000}SELECT * FROM
system_views.snapshots{color} by avoiding expensive disk traversal when listing
snapshots*
To validate this goal is being achieved with the proposed patch, I created a
rough benchmark comparing listsnapshot performance in the following
implementations:
* {*}listsnapshots_disk{*}: Old implementation fetching snapshots from disk in
every call to listsnapshots
* {*}listsnapshots_cached{*}: New cached implementation
* {*}listsnapshots_cached_checkexists{*}: New cached implementation + check
manifest file exists during fetch
The benchmark consists of a simple junit test ([code
here|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/test/unit/org/apache/cassandra/service/snapshot/SnapshotManagerTest.java])
fetching 100 snapshots of 10 tables with 10 sstables each 1000 times for each
implementation.
I got the following test execution times in my modest SSD laptop for each
implementation:
* {*}listsnapshots_disk{*}: 37 seconds
* {*}listsnapshots_cached{*}: 36 milliseconds
* {*}listsnapshots_cached_checkexists{*}: 4 seconds
The *listsnapshots_cached* results indicate that caching snapshots greatly
improves *listsnapshots* speed compared to the current *listsnapshots_disk*
implementation as expected, what accomplishes *Goal 1* and justifies this patch.
The additional snapshot manifest existence check from
*listsnapshots_cached_checkexists* adds considerable overhead in comparison to
{*}listsnapshots_cached{*}, but it's still significantly faster than the
previous *listsnapshots_disk* implementation.
* *⚠️ Goal 2: Consolidate / centralize "as much as possible" snapshot logic in
SnapshotManager (CASSANDRA-18271)*
While this patch makes progress towards this goal, there is still considerable
amount of snapshot logic in *StorageService* and {*}ColumnFamilyStore{*}. See
discussion for each subsystem below:
*A) StorageService:* there is some snapshot handling logic in at least the
following methods:
*
[takeSnapshot|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L2735]
*
[getSnapshotDetails|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3020]
*
[trueSnapshotsSize|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3042]
I think we could simplify a great deal of code by moving remaining snapshot
logic from StorageService to SnapshotManager and create a dedicated
[SnapshotManagerMbean|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/snapshot/SnapshotManagerMBean.java]
to expose snapshot methods via JMX moving forward. WDYT ?
This would allow refactoring and simplifying some snapshot logic, for example
unifying implementations of
[takeMultipleTableSnapshot|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2914]
and
[takeSnapshot.|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2868]
The proposal above would help retire snapshot logic from StorageService and
eventually remove deprecated snapshot handling methods from
StorageServiceMbean. I'm happy to take these suggestions to a follow-up ticket,
but wanted to hear your thoughts on this refactoring proposal.
*B) ColumnFamilyStore:* there is a fundamental coupling between
ColumnFamilyStore and snapshot creation, since snapshot creation requires
flushing and locking sstables while creating the hardlinks. I don't think we
can fully remove this dependency but maybe there's room for further
cleanup/improvement in a follow-up ticket.
*⚠️* *SnapshotWatcher*
I am a bit concerned by the additional complexity added by SnapshotWatcher and
reliance on WatchService's / inotify implementation to detect when a snapshot
was manually removed from outside the process.
How about checking if the manifest file exists periodically or during fetch if
the user wants to enable this detection ? This seems relatively cheap based in
the *listsnapshots_cached_checkexists* results while being considerably simpler
than the SnapshotWatcher machinery to achieve the same purpose.
*Additional comment on Keyspace-level snapshot atomicity*
As a general comment, while reviewing this I noticed that snapshot atomicity is
not guaranteed across multiple tables in the same keyspace-level snapshot. This
could potentially lead to coherence issues if a user expects a keyspace level
snapshot to reflect a consistent state across multiple tables. Maybe it makes
sense to support atomic keyspace-level snapshot in the future to provide this
guarantee.
*Snapshots with multiple data directories* (follow-up to this comment)
bq. If we create a manifest per snapshot dir, we would need to specify only
these sstables which are indeed located in that snapshot dir only (in case of a
multi data dirs scenario). Right now, a manifest contains all sstables in its
"files" field but if we were to just blindly copy this manifest to each dir,
then "files" would contain sstables which are not in that specific snapshot dir
where that manifest is.
Do we need to change the way manifests are handled for multiple data
directories in this ticket ? I think this would be a departure from the current
behavior where snapshot manifests lists files across all data directories.
I think we should continue including files across all directories in the
snapshot manifest as done currently, because this allows verifying the
consistency of the snapshot contents. For example, if one snapshot subdirectory
goes missing there will be no way to detect the snapshot is missing data by
looking at the manifest file.
was (Author: paulo):
(I structured this review into multiple sections to hopefully make it easier to
discuss)
I'd like to restate and discuss the goals of this ticket to ensure we're on the
same page:
* *✅ Goal 1: Improve performance of {color:#ff0000}+nodetool
listsnapshots+{color} / {color:#ff0000}SELECT * FROM
system_views.snapshots{color} by avoiding expensive disk traversal when listing
snapshots*
To validate this goal is being achieved with the proposed patch, I created a
rough benchmark comparing listsnapshot performance in the following
implementations:
* {*}listsnapshots_disk{*}: Old implementation fetching snapshots from disk in
every call to listsnapshots
* {*}listsnapshots_cached{*}: New cached implementation
* {*}listsnapshots_cached_checkexists{*}: New cached implementation + check
manifest file exists during fetch
The benchmark consists of a simple junit test ([code
here|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/test/unit/org/apache/cassandra/service/snapshot/SnapshotManagerTest.java])
fetching 100 snapshots of 10 tables with 10 sstables each 1000 times for each
implementation.
I got the following test execution times in my modest SSD laptop for each
implementation:
* {*}listsnapshots_disk{*}: 37 seconds
* {*}listsnapshots_cached{*}: 36 milliseconds
* {*}listsnapshots_cached_checkexists{*}: 4 seconds
The *listsnapshots_cached* results indicate that caching snapshots greatly
improves *listsnapshots* speed compared to the current *listsnapshots_disk*
implementation as expected, what accomplishes *Goal 1* and justifies this patch.
The additional snapshot manifest existence check from
*listsnapshots_cached_checkexists* adds considerable overhead in comparison to
{*}listsnapshots_cached{*}, but it's still significantly faster than the
previous *listsnapshots_disk* implementation.
* *⚠️ Goal 2: Consolidate / centralize "as much as possible" snapshot logic in
SnapshotManager (CASSANDRA-18271)*
While this patch makes progress towards this goal, there is still considerable
amount of snapshot logic in *StorageService* and {*}ColumnFamilyStore{*}. See
discussion for each subsystem below:
*A) StorageService:* there is some snapshot handling logic in at least the
following methods:
*
[takeSnapshot|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L2745]
*
[getSnapshotDetails|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3062]
*
[trueSnapshotsSize|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3062]
I think we could simplify a great deal of code by moving remaining snapshot
logic from StorageService to SnapshotManager and create a dedicated
[SnapshotManagerMbean|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/snapshot/SnapshotManagerMBean.java]
to expose snapshot methods via JMX moving forward. WDYT ?
This would allow refactoring and simplifying some snapshot logic, for example
unifying implementations of
[takeMultipleTableSnapshot|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2914]
and
[takeSnapshot.|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2868]
The proposal above would help retire snapshot logic from StorageService and
eventually remove deprecated snapshot handling methods from
StorageServiceMbean. I'm happy to take these suggestions to a follow-up ticket,
but wanted to hear your thoughts on this refactoring proposal.
*B) ColumnFamilyStore:* there is a fundamental coupling between
ColumnFamilyStore and snapshot creation, since snapshot creation requires
flushing and locking sstables while creating the hardlinks. I don't think we
can fully remove this dependency but maybe there's room for further
cleanup/improvement in a follow-up ticket.
*⚠️* *SnapshotWatcher*
I am a bit concerned by the additional complexity added by SnapshotWatcher and
reliance on WatchService's / inotify implementation to detect when a snapshot
was manually removed from outside the process.
How about checking if the manifest file exists periodically or during fetch if
the user wants to enable this detection ? This seems relatively cheap based in
the *listsnapshots_cached_checkexists* results while being considerably simpler
than the SnapshotWatcher machinery to achieve the same purpose.
*Additional comment on Keyspace-level snapshot atomicity*
As a general comment, while reviewing this I noticed that snapshot atomicity is
not guaranteed across multiple tables in the same keyspace-level snapshot. This
could potentially lead to coherence issues if a user expects a keyspace level
snapshot to reflect a consistent state across multiple tables. Maybe it makes
sense to support atomic keyspace-level snapshot in the future to provide this
guarantee.
*Snapshots with multiple data directories* (follow-up to this comment)
> If we create a manifest per snapshot dir, we would need to specify only these
> sstables which are indeed located in that snapshot dir only (in case of a
> multi data dirs scenario). Right now, a manifest contains all sstables in its
> "files" field but if we were to just blindly copy this manifest to each dir,
> then "files" would contain sstables which are not in that specific snapshot
> dir where that manifest is.
Do we need to change the way manifests are handled for multiple data
directories in this ticket ? I think this would be a departure from the current
behavior where snapshot manifests lists files across all data directories.
I think we should continue including files across all directories in the
snapshot manifest as done currently, because this allows verifying the
consistency of the snapshot contents. For example, if one snapshot subdirectory
goes missing there will be no way to detect the snapshot is missing data by
looking at the manifest file.
> Centralize all snapshot operations to SnapshotManager and cache snapshots
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-18111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Snapshots
> Reporter: Paulo Motta
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 4h 50m
> Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are
> scanned to find snapshots, what is inefficient.
> For example, fetching the
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]