Sound like part of a backup strategy. Probably worth chiming in on the sidecar issue: https://issues.apache.org/jira/browse/CASSSIDECAR-148.
IIRC, Medusa and Tablesnap both uploaded a manifest and don't upload multiple copies of the same SSTables. I think this should definitely be part of our backup system. Jon On Sun, Jan 12, 2025 at 10:25 AM Štefan Miklošovič <smikloso...@apache.org> wrote: > Hi, > > I would like to run this through ML to gather feedback as we are > contemplating about making this happen. > > Currently, snapshots are just hardlinks located in a snapshot directory to > live data directory. That is super handy as it occupies virtually zero disk > space etc (as long as underlying SSTables are not compacted away, then > their size would "materialize"). > > On the other hand, because it is a hardlink, it is not possible to make > hard links across block devices (infamous "Invalid cross-device link" > error). That means that snapshots can ever be located on the very same disk > Cassandra has its datadirs on. > > Imagine there is a company ABC which has 10 TiB disk (or NFS share) > mounted to a Cassandra node and they would like to use that as a cheap / > cold storage of snapshots. They do not care about the speed of such storage > nor they care about how much space it occupies etc. when it comes to > snapshots. On the other hand, they do not want to have snapshots occupying > a disk space where Cassandra has its data because they consider it to be a > waste of space. They would like to utilize fast disk and disk space for > production data to the max and snapshots might eat a lot of that space > unnecessarily. > > There might be a configuration property like "snapshot_root_dir: > /mnt/nfs/cassandra" and if a snapshot is taken, it would just copy SSTables > there, but we need to be a little bit smart here (By default, it would all > work as it does now - hard links to snapshot directories located under > Cassandra's data_file_directories.) > > Because it is a copy, it occupies disk space. But if we took 100 snapshots > on the same SSTables, we would not want to copy the same files 100 times. > There is a very handy way to prevent this - unique SSTable identifiers > (under already existing uuid_sstable_identifiers_enabled property) so we > could have a flat destination hierarchy where all SSTables would be located > in the same directory and we would just check if such SSTable is already > there or not before copying it. Snapshot manifests (currently under > manifest.json) would then contain all SSTables a logical snapshot consists > of. > > This would be possible only for _user snapshots_. All snapshots taken by > Cassandra itself (diagnostic snapshots, snapshots upon repairs, snapshots > against all system tables, ephemeral snapshots) would continue to be hard > links and it would not be possible to locate them outside of live data > dirs. > > The advantages / characteristics of this approach for user snapshots: > > 1. Cassandra will be able to create snapshots located on different devices. > 2. From an implementation perspective it would be totally transparent, > there will be no specific code about "where" we copy. We would just copy, > from Java perspective, as we copy anywhere else. > 3. All the tooling would work as it does now - nodetool listsnapshots / > clearsnapshot / snapshot. Same outputs, same behavior. > 4. No need to use external tools copying SSTables to desired destination, > custom scripts, manual synchronisation ... > 5. Snapshots located outside of Cassandra live data dirs would behave the > same when it comes to snapshot TTL. (TTL on snapshot means that after so > and so period of time, they are automatically removed). This logic would be > the same. Hence, there is not any need to re-invent a wheel when it comes > to removing expired snapshots from the operator's perspective. > 6. Such a solution would deduplicate SSTables so it would be as > space-efficient as possible (but not as efficient as hardlinks, because of > obvious reasons mentioned above). > > It seems to me that there is recently a "push" to add more logic to > Cassandra where it was previously delegated for external toolings, for > example CEP around automatic repairs are basically doing what external > tooling does, we just move it under Cassandra. We would love to get rid of > a lot of tooling and customly written logic around copying snapshot > SSTables. From the implementation perspective it would be just plain Java, > without any external dependencies etc. There seems to be a lot to gain for > relatively straightforward additions to the snapshotting code. > > We did a serious housekeeping in CASSANDRA-18111 where we consolidated and > centralized everything related to snapshot management so we feel > comfortable to build logic like this on top of that. In fact, > CASSANDRA-18111 was a prerequisite for this because we did not want to base > this work on pre-18111 state of things when it comes to snapshots (it was > all over the code base, fragmented and duplicated logic etc). > > WDYT? > > Regards >