[ 
https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529680#comment-17529680
 ] 

Paulo Motta edited comment on CASSANDRA-16843 at 4/28/22 11:29 PM:
-------------------------------------------------------------------

The reason why snapshots of "dropped tables" are omitted from the "nodetool 
listsnapshots" output above is because the prior implementation relied on the 
mechanics of 
[ColumnFamilyStore|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2240=]
 and 
[Directories|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L964=]
 to list snapshots.

Since dropped tables no longer have an associated {{ColumnFamilyStore}} object, 
it's not possible to list snapshots of dropped tables in the current 
implementation.

[This patch|https://github.com/apache/cassandra/pull/1595] re-architects the 
snapshot listing logic to be fully decoupled from 
{{{}ColumnFamilyStore{}}}/{{{}Directories{}}} and rely solely on the snapshot 
directory structure, which currently has this format:
 * {{$data_dir/$ks_name/$table_name-$table_uuid/snapshots/$tag}}

The new snapshot discovery logic is mostly contained in the 
[SnapshotLoader|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java]
 class, which traverses the data directory [looking for snapshot directories 
matching the pattern 
above|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java#L102=].

I updated 
[StorageService.getSnapshotDetails|https://github.com/apache/cassandra/pull/1595/files#diff-9bf2c26bc294ef9085e16bf287490223665eaa2eb8ec24bcf5bd8653c713644bR4131]
 which is used by {{nodetool listsnapshots}} to use new {{SnapshotLoader}} 
class to list snapshots.

The snapshot true size computation was previously dependent on logic from 
[Directories|https://github.com/apache/cassandra/blob/bb3749f2bb8282f67375c67712d8e3ca1f085879/src/java/org/apache/cassandra/db/Directories.java#L1153],
 so in order to fully decouple snapshot listing from {{Directories}}, I 
[simplified the computation of snapshot true 
size|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java#L282]
 to only include files which do not have a corresponding "live" file on 
{{{}$data_dir/$ks_name/$table_name-$table_uuid{}}}.

This simplification to the snapshot true size computation fixed two additional 
issues with the previous implementation (illustrated with examples in the 
previous comment):
1) Snapshot true size did not include "schema.cql" and "manifest.json" sizes
2) Snapshot true size did not include secondary indexes (CASSANDRA-17357)

I performed other simplifications and refactorings along the way, but given the 
proximity to the 4.1 freeze, I prepared a leaner version of the original patch 
to facilitate review.

After this is merged I will prepare another set of follow-up patches (for next 
release) with refactorings and simplifications in the snapshot management 
module that will be enabled by this change.

Testing:
 - [dtest to check if snapshot of dropped tables are included on 
listsnapshots|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/test/distributed/org/apache/cassandra/distributed/test/SnapshotsTest.java#L195=]
 - 
[SnapshotLoaderTest|https://github.com/apache/cassandra/blob/e3fd12ab2a112dad5cdba7cda226aab12f0c2c04/test/unit/org/apache/cassandra/service/snapshot/SnapshotLoaderTest.java]
 - [Test to check that manifest and schema file sizes are included in true size 
computation|https://github.com/apache/cassandra/pull/1595/files#diff-ef5be0b69d0440b76021282c4b24bad69770ef9419be260df2169f49921db377R291]
 - [Update DirectoriesTest.testSecondaryIndexDirectories to include 2i on true 
size 
computation|https://github.com/apache/cassandra/pull/1595/files#diff-1948a455b59a97d8d1ab3d2cb5388190c1cbb8e8081e3ac97bfc0c51a7ef64e3R421]
 - [testGetLiveFileFromSnapshotFile (used by new true size 
computation)|https://github.com/apache/cassandra/pull/1595/files#diff-d349fb289ec10bece5531f1630cd2bcc55665b5cf3cd59cfcfb4dc93f288a571R233]


was (Author: paulo):
The reason why snapshots of "dropped tables" are omitted from the "nodetool 
listsnapshots" output above is because the prior implementation relied on the 
mechanics of 
[ColumnFamilyStore|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2240=]
 and 
[Directories|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L964=]
 to list snapshots.

Since dropped tables no longer have an associated {{ColumnFamilyStore}} object, 
it's not possible to list snapshots of dropped tables in the current 
implementation.

[This patch|https://github.com/apache/cassandra/pull/1595] re-architects the 
snapshot listing logic to be fully decoupled from 
{{{}ColumnFamilyStore{}}}/{{{}Directories{}}} and rely solely on the snapshot 
directory structure, which currently has this format:
 * {{$data_dir/$ks_name/$table_name-$table_uuid/snapshots/$tag}}

The new snapshot discovery logic is mostly contained in the 
[SnapshotLoader|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java]
 class, which traverses the data directory [looking for snapshot directories 
matching the pattern 
above|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java#L102=].

I updated 
[StorageService.getSnapshotDetails|https://github.com/apache/cassandra/pull/1595/files#diff-9bf2c26bc294ef9085e16bf287490223665eaa2eb8ec24bcf5bd8653c713644bR4131]
 which is used by {{nodetool listsnapshots}} to use new {{SnapshotLoader}} 
class to list snapshots.

The snapshot true size computation was previously dependent on logic from 
[Directories|https://github.com/apache/cassandra/blob/bb3749f2bb8282f67375c67712d8e3ca1f085879/src/java/org/apache/cassandra/db/Directories.java#L1153],
 so in order to fully decouple snapshot listing from {{Directories}}, I 
[simplified the computation of snapshot true 
size|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java#L282]
 to only include files which do not have a corresponding "live" file on 
{{{}$data_dir/$ks_name/$table_name-$table_uuid{}}}.

This simplification to the snapshot true size computation fixed two additional 
issues with the previous implementation (illustrated with examples in the 
previous comment):
1) Snapshot true size did not include "schema.cql" and "manifest.json" sizes
2) Snapshot true size did not include secondary indexes (CASSANDRA-17357)

I performed other simplifications and refactorings along the way, but given the 
proximity to the 4.1 freeze, I prepared a leaner version of the original patch 
to facilitate review.

After this is merged I will prepare another set of follow-up patches (for next 
release) with refactorings and simplifications in the snapshot management 
module that will be enabled by this change.

Testing:
 - [dtest to check if snapshot of dropped tables are included on 
listsnapshots|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/test/distributed/org/apache/cassandra/distributed/test/SnapshotsTest.java#L195=]
 - 
[SnapshotLoaderTest|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/test/unit/org/apache/cassandra/service/snapshot/SnapshotLoaderTest.java]
 - [Test to check that manifest and schema file sizes are included in true size 
computation|https://github.com/apache/cassandra/pull/1595/files#diff-ef5be0b69d0440b76021282c4b24bad69770ef9419be260df2169f49921db377R291]
 - [Update DirectoriesTest.testSecondaryIndexDirectories to include 2i on true 
size 
computation|https://github.com/apache/cassandra/pull/1595/files#diff-1948a455b59a97d8d1ab3d2cb5388190c1cbb8e8081e3ac97bfc0c51a7ef64e3R421]
 - [testGetLiveFileFromSnapshotFile (used by new true size 
computation)|https://github.com/apache/cassandra/pull/1595/files#diff-d349fb289ec10bece5531f1630cd2bcc55665b5cf3cd59cfcfb4dc93f288a571R233]

> List snapshots of dropped tables
> --------------------------------
>
>                 Key: CASSANDRA-16843
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16843
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Snapshots
>            Reporter: James Brown
>            Assignee: Paulo Motta
>            Priority: Normal
>             Fix For: 4.1
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Auto snapshots from dropped tables don't seem to show up in {{nodetool 
> listsnapshots}} (even though they do get cleared by {{nodetool 
> clearsnapshot}}). This makes them kind of annoying to clean up, since you 
> need to muck about in the data directory to find them.
> Erick on the mailing list said that this seems to be an oversight and that 
> clearsnapshot was fixed by 
> [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418].
> I reproduced this both on 3.11.11 and 4.0.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to