[ 
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870085#comment-17870085
 ] 

Paulo Motta edited comment on CASSANDRA-18111 at 8/1/24 5:24 AM:
-----------------------------------------------------------------

(I structured this review into multiple sections to hopefully make it easier to 
discuss)

I'd like to restate and discuss the goals of this ticket to ensure we're on the 
same page:
 *  *✅ Goal 1: Improve performance of {color:#ff0000}+nodetool 
listsnapshots+{color} / {color:#ff0000}SELECT * FROM 
system_views.snapshots{color} by avoiding expensive disk traversal when listing 
snapshots*

To validate this goal is being achieved with the proposed patch, I created a 
rough benchmark comparing listsnapshot performance in the following 
implementations:
 * {*}listsnapshots_disk{*}: Old implementation fetching snapshots from disk in 
every call to listsnapshots
 * {*}listsnapshots_cached{*}: New cached implementation
 * {*}listsnapshots_cached_checkexists{*}: New cached implementation + check 
manifest file exists during fetch

The benchmark consists of a simple junit test ([code 
here|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/test/unit/org/apache/cassandra/service/snapshot/SnapshotManagerTest.java])
 fetching 100 snapshots of 10 tables with 10 sstables each 1000 times for each 
implementation.

I got the following test execution times in my modest SSD laptop for each 
implementation:
 * {*}listsnapshots_disk{*}: 37 seconds
 * {*}listsnapshots_cached{*}: 36 milliseconds
 * {*}listsnapshots_cached_checkexists{*}: 4 seconds

The *listsnapshots_cached* results indicate that caching snapshots greatly 
improves *listsnapshots* speed compared to the current *listsnapshots_disk* 
implementation as expected, what accomplishes *Goal 1* and justifies this patch.

The additional snapshot manifest existence check from 
*listsnapshots_cached_checkexists* adds considerable overhead in comparison to 
{*}listsnapshots_cached{*}, but it's still significantly faster than the 
previous *listsnapshots_disk* implementation.
 * *⚠️ Goal 2: Consolidate / centralize "as much as possible" snapshot logic in 
SnapshotManager (CASSANDRA-18271)*

While this patch makes progress towards this goal, there is still considerable 
amount of snapshot logic in *StorageService* and {*}ColumnFamilyStore{*}.  See 
discussion for each subsystem below:

*A) StorageService:* there is some snapshot handling logic in at least the 
following methods:
 * 
[takeSnapshot|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L2735]
 * 
[getSnapshotDetails|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3020]
 * 
[trueSnapshotsSize|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3042]

I think we could simplify a great deal of code by moving remaining snapshot 
logic from StorageService to SnapshotManager and create a dedicated 
[SnapshotManagerMbean|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/snapshot/SnapshotManagerMBean.java]
 to expose snapshot methods via JMX moving forward. WDYT ?

This would allow refactoring and simplifying some snapshot logic, for example 
unifying implementations of 
[takeMultipleTableSnapshot|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2914]
 and 
[takeSnapshot.|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2868]

The proposal above would help retire snapshot logic from StorageService and 
eventually remove deprecated snapshot handling methods from 
StorageServiceMbean. I'm happy to take these suggestions to a follow-up ticket, 
but wanted to hear your thoughts on this refactoring proposal.

*B) ColumnFamilyStore:* there is a fundamental coupling between 
ColumnFamilyStore and snapshot creation, since snapshot creation requires 
flushing and locking sstables while creating the hardlinks. I don't think we 
can fully remove this dependency but maybe there's room for further 
cleanup/improvement in a follow-up ticket.



*⚠️* *SnapshotWatcher*

I am a bit concerned by the additional complexity added by SnapshotWatcher and 
reliance on WatchService's / inotify implementation to detect when a snapshot 
was manually removed from outside the process.

How about checking if the manifest file exists periodically or during fetch if 
the user wants to enable this detection ? This seems relatively cheap based in 
the *listsnapshots_cached_checkexists* results while being considerably simpler 
than the SnapshotWatcher machinery to achieve the same purpose.



*Additional comment on Keyspace-level snapshot atomicity*

As a general comment, while reviewing this I noticed that snapshot atomicity is 
not guaranteed across multiple tables in the same keyspace-level snapshot. This 
could potentially lead to coherence issues if a user expects a keyspace level 
snapshot to reflect a consistent state across multiple tables. Maybe it makes 
sense to support atomic keyspace-level snapshot in the future to provide this 
guarantee.



*Snapshots with multiple data directories* (follow-up to this comment)

bq. If we create a manifest per snapshot dir, we would need to specify only 
these sstables which are indeed located in that snapshot dir only (in case of a 
multi data dirs scenario). Right now, a manifest contains all sstables in its 
"files" field but if we were to just blindly copy this manifest to each dir, 
then "files" would contain sstables which are not in that specific snapshot dir 
where that manifest is.

Do we need to change the way manifests are handled for multiple data 
directories in this ticket ? I think this would be a departure from the current 
behavior where snapshot manifests lists files across all data directories. 

I think we should continue including files across all directories in the 
snapshot manifest as done currently, because this allows verifying the 
consistency of the snapshot contents. For example, if one snapshot subdirectory 
goes missing there will be no way to detect the snapshot is missing data by 
looking at the manifest file.


was (Author: paulo):
(I structured this review into multiple sections to hopefully make it easier to 
discuss)

I'd like to restate and discuss the goals of this ticket to ensure we're on the 
same page:
 *  *✅ Goal 1: Improve performance of {color:#ff0000}+nodetool 
listsnapshots+{color} / {color:#ff0000}SELECT * FROM 
system_views.snapshots{color} by avoiding expensive disk traversal when listing 
snapshots*

To validate this goal is being achieved with the proposed patch, I created a 
rough benchmark comparing listsnapshot performance in the following 
implementations:
 * {*}listsnapshots_disk{*}: Old implementation fetching snapshots from disk in 
every call to listsnapshots
 * {*}listsnapshots_cached{*}: New cached implementation
 * {*}listsnapshots_cached_checkexists{*}: New cached implementation + check 
manifest file exists during fetch

The benchmark consists of a simple junit test ([code 
here|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/test/unit/org/apache/cassandra/service/snapshot/SnapshotManagerTest.java])
 fetching 100 snapshots of 10 tables with 10 sstables each 1000 times for each 
implementation.

I got the following test execution times in my modest SSD laptop for each 
implementation:
 * {*}listsnapshots_disk{*}: 37 seconds
 * {*}listsnapshots_cached{*}: 36 milliseconds
 * {*}listsnapshots_cached_checkexists{*}: 4 seconds

The *listsnapshots_cached* results indicate that caching snapshots greatly 
improves *listsnapshots* speed compared to the current *listsnapshots_disk* 
implementation as expected, what accomplishes *Goal 1* and justifies this patch.

The additional snapshot manifest existence check from 
*listsnapshots_cached_checkexists* adds considerable overhead in comparison to 
{*}listsnapshots_cached{*}, but it's still significantly faster than the 
previous *listsnapshots_disk* implementation.
 * *⚠️ Goal 2: Consolidate / centralize "as much as possible" snapshot logic in 
SnapshotManager (CASSANDRA-18271)*

While this patch makes progress towards this goal, there is still considerable 
amount of snapshot logic in *StorageService* and {*}ColumnFamilyStore{*}.  See 
discussion for each subsystem below:

*A) StorageService:* there is some snapshot handling logic in at least the 
following methods:
 * 
[takeSnapshot|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L2745]
 * 
[getSnapshotDetails|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3062]
 * 
[trueSnapshotsSize|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3062]

I think we could simplify a great deal of code by moving remaining snapshot 
logic from StorageService to SnapshotManager and create a dedicated 
[SnapshotManagerMbean|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/snapshot/SnapshotManagerMBean.java]
 to expose snapshot methods via JMX moving forward. WDYT ?

This would allow refactoring and simplifying some snapshot logic, for example 
unifying implementations of 
[takeMultipleTableSnapshot|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2914]
 and 
[takeSnapshot.|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2868]

The proposal above would help retire snapshot logic from StorageService and 
eventually remove deprecated snapshot handling methods from 
StorageServiceMbean. I'm happy to take these suggestions to a follow-up ticket, 
but wanted to hear your thoughts on this refactoring proposal.

*B) ColumnFamilyStore:* there is a fundamental coupling between 
ColumnFamilyStore and snapshot creation, since snapshot creation requires 
flushing and locking sstables while creating the hardlinks. I don't think we 
can fully remove this dependency but maybe there's room for further 
cleanup/improvement in a follow-up ticket.



*⚠️* *SnapshotWatcher*

I am a bit concerned by the additional complexity added by SnapshotWatcher and 
reliance on WatchService's / inotify implementation to detect when a snapshot 
was manually removed from outside the process.

How about checking if the manifest file exists periodically or during fetch if 
the user wants to enable this detection ? This seems relatively cheap based in 
the *listsnapshots_cached_checkexists* results while being considerably simpler 
than the SnapshotWatcher machinery to achieve the same purpose.



*Additional comment on Keyspace-level snapshot atomicity*

As a general comment, while reviewing this I noticed that snapshot atomicity is 
not guaranteed across multiple tables in the same keyspace-level snapshot. This 
could potentially lead to coherence issues if a user expects a keyspace level 
snapshot to reflect a consistent state across multiple tables. Maybe it makes 
sense to support atomic keyspace-level snapshot in the future to provide this 
guarantee.



*Snapshots with multiple data directories* (follow-up to this comment)

> If we create a manifest per snapshot dir, we would need to specify only these 
> sstables which are indeed located in that snapshot dir only (in case of a 
> multi data dirs scenario). Right now, a manifest contains all sstables in its 
> "files" field but if we were to just blindly copy this manifest to each dir, 
> then "files" would contain sstables which are not in that specific snapshot 
> dir where that manifest is.

Do we need to change the way manifests are handled for multiple data 
directories in this ticket ? I think this would be a departure from the current 
behavior where snapshot manifests lists files across all data directories. 

I think we should continue including files across all directories in the 
snapshot manifest as done currently, because this allows verifying the 
consistency of the snapshot contents. For example, if one snapshot subdirectory 
goes missing there will be no way to detect the snapshot is missing data by 
looking at the manifest file.

> Centralize all snapshot operations to SnapshotManager and cache snapshots
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18111
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Snapshots
>            Reporter: Paulo Motta
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are 
> scanned to find snapshots, what is inefficient.
> For example, fetching the 
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric 
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via 
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to