[
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867638#comment-17867638
]
Paulo Motta commented on CASSANDRA-18111:
-----------------------------------------
While testing this I noticed that when a snapshot from a table is split across
multiple data directories, and one of the directories is manually removed, then
this causes the cleanup mechanism to remove the snapshot files from the other
directories.
When a snapshot is spread across multiple data directories I think the intent
is to only stop tracking the snapshot on SnapshotManager when all snapshot
subdirectories are removed? We don't want to clear additional snapshot
directories if one of the subdirectories was manually removed.
Alternatively we can consider that a snapshot is valid as long as the
"manifest.json" exists ? This would create a requirement that all snapshots
should contain a "manifest.json" to be tracked by SnapshotManager. I think this
is a fair requirement, because without the manifest it's not possible to ensure
whether a snapshot was partially corrupted (ie. some files were removed from
it).
See example:
{code:none}
$ cat
data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/manifest.json
{
"files" : [ "oa-2-big-Data.db", "oa-1-big-Data.db" ],
"created_at" : "2024-07-22T01:53:47.026Z",
"expires_at" : null,
"ephemeral" : false
}
$ls -ltra
data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/
total 48
-rw-rw-r-- 2 user user 16 Jul 21 21:53 oa-1-big-Filter.db
-rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-1-big-Summary.db
-rw-rw-r-- 2 user user 20 Jul 21 21:53 oa-1-big-Index.db
-rw-rw-r-- 2 user user 10 Jul 21 21:53 oa-1-big-Digest.crc32
-rw-rw-r-- 2 user user 137 Jul 21 21:53 oa-1-big-Data.db
-rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-1-big-TOC.txt
-rw-rw-r-- 2 user user 5429 Jul 21 21:53 oa-1-big-Statistics.db
-rw-rw-r-- 2 user user 47 Jul 21 21:53 oa-1-big-CompressionInfo.db
drwxrwxr-x 3 user user 4096 Jul 21 21:53 ..
drwxrwxr-x 2 user user 4096 Jul 21 21:53 .
-rw-rw-r-- 1 user user 149 Jul 21 21:53 manifest.json
$ ls -ltra
data/data2/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/
total 44
-rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-2-big-Summary.db
-rw-rw-r-- 2 user user 20 Jul 21 21:53 oa-2-big-Index.db
-rw-rw-r-- 2 user user 16 Jul 21 21:53 oa-2-big-Filter.db
-rw-rw-r-- 2 user user 129 Jul 21 21:53 oa-2-big-Data.db
-rw-rw-r-- 2 user user 10 Jul 21 21:53 oa-2-big-Digest.crc32
-rw-rw-r-- 2 user user 47 Jul 21 21:53 oa-2-big-CompressionInfo.db
-rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-2-big-TOC.txt
-rw-rw-r-- 2 user user 5430 Jul 21 21:53 oa-2-big-Statistics.db
drwxrwxr-x 3 user user 4096 Jul 21 21:53 ..
drwxrwxr-x 2 user user 4096 Jul 21 21:53 .
# Remove data2 manually, but keep data1
$ rm -rf
data/data2/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/
$ ls -ltra
data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/
total 48
-rw-rw-r-- 2 user user 16 Jul 21 21:53 oa-1-big-Filter.db
-rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-1-big-Summary.db
-rw-rw-r-- 2 user user 20 Jul 21 21:53 oa-1-big-Index.db
-rw-rw-r-- 2 user user 10 Jul 21 21:53 oa-1-big-Digest.crc32
-rw-rw-r-- 2 user user 137 Jul 21 21:53 oa-1-big-Data.db
-rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-1-big-TOC.txt
-rw-rw-r-- 2 user user 5429 Jul 21 21:53 oa-1-big-Statistics.db
-rw-rw-r-- 2 user user 47 Jul 21 21:53 oa-1-big-CompressionInfo.db
drwxrwxr-x 3 user user 4096 Jul 21 21:53 ..
drwxrwxr-x 2 user user 4096 Jul 21 21:53 .
-rw-rw-r-- 1 user user 149 Jul 21 21:53 manifest.json
[after some time]
INFO [SnapshotCleanup:1] 2024-07-21 22:01:46,818 SnapshotManager.java:243 -
Removing snapshot TableSnapshot{keyspaceName='system',
tableName='compaction_history', tableId=b4dbb7b4-dc49-3fb5-b3bf-ce6e434832ca,
tag='test', createdAt=2024-07-22T01:53:47.026Z, expiresAt=null,
snapshotDirs=[/tmp/apache-cassandra-5.1-SNAPSHOT/data/data2/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test,
/tmp/apache-cassandra-5.1-SNAPSHOT/data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test],
ephemeral=false}
$ ls -ltra
data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/
ls: cannot access
'data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/':
No such file or directory <-- other snapshot subdirectory "data1" was removed
{code}
> Centralize all snapshot operations to SnapshotManager and cache snapshots
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-18111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Snapshots
> Reporter: Paulo Motta
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are
> scanned to find snapshots, what is inefficient.
> For example, fetching the
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]