Re: [DISCUSS] Snapshots outside of Cassandra data directory

Jon Haddad Sun, 12 Jan 2025 14:45:53 -0800

Sound like part of a backup strategy.    Probably worth chiming in on the
sidecar issue: https://issues.apache.org/jira/browse/CASSSIDECAR-148.


IIRC, Medusa and Tablesnap both uploaded a manifest and don't upload
multiple copies of the same SSTables.  I think this should definitely be
part of our backup system.

Jon



On Sun, Jan 12, 2025 at 10:25 AM Štefan Miklošovič <smikloso...@apache.org>
wrote:

> Hi,
>
> I would like to run this through ML to gather feedback as we are
> contemplating about making this happen.
>
> Currently, snapshots are just hardlinks located in a snapshot directory to
> live data directory. That is super handy as it occupies virtually zero disk
> space etc (as long as underlying SSTables are not compacted away, then
> their size would "materialize").
>
> On the other hand, because it is a hardlink, it is not possible to make
> hard links across block devices (infamous "Invalid cross-device link"
> error). That means that snapshots can ever be located on the very same disk
> Cassandra has its datadirs on.
>
> Imagine there is a company ABC which has 10 TiB disk (or NFS share)
> mounted to a Cassandra node and they would like to use that as a cheap /
> cold storage of snapshots. They do not care about the speed of such storage
> nor they care about how much space it occupies etc. when it comes to
> snapshots. On the other hand, they do not want to have snapshots occupying
> a disk space where Cassandra has its data because they consider it to be a
> waste of space. They would like to utilize fast disk and disk space for
> production data to the max and snapshots might eat a lot of that space
> unnecessarily.
>
> There might be a configuration property like "snapshot_root_dir:
> /mnt/nfs/cassandra" and if a snapshot is taken, it would just copy SSTables
> there, but we need to be a little bit smart here (By default, it would all
> work as it does now - hard links to snapshot directories located under
> Cassandra's data_file_directories.)
>
> Because it is a copy, it occupies disk space. But if we took 100 snapshots
> on the same SSTables, we would not want to copy the same files 100 times.
> There is a very handy way to prevent this - unique SSTable identifiers
> (under already existing uuid_sstable_identifiers_enabled property) so we
> could have a flat destination hierarchy where all SSTables would be located
> in the same directory and we would just check if such SSTable is already
> there or not before copying it. Snapshot manifests (currently under
> manifest.json) would then contain all SSTables a logical snapshot consists
> of.
>
> This would be possible only for _user snapshots_. All snapshots taken by
> Cassandra itself (diagnostic snapshots, snapshots upon repairs, snapshots
> against all system tables, ephemeral snapshots) would continue to be hard
> links and it would not be possible to locate them outside of live data
> dirs.
>
> The advantages / characteristics of this approach for user snapshots:
>
> 1. Cassandra will be able to create snapshots located on different devices.
> 2. From an implementation perspective it would be totally transparent,
> there will be no specific code about "where" we copy. We would just copy,
> from Java perspective, as we copy anywhere else.
> 3. All the tooling would work as it does now - nodetool listsnapshots /
> clearsnapshot / snapshot. Same outputs, same behavior.
> 4. No need to use external tools copying SSTables to desired destination,
> custom scripts, manual synchronisation ...
> 5. Snapshots located outside of Cassandra live data dirs would behave the
> same when it comes to snapshot TTL. (TTL on snapshot means that after so
> and so period of time, they are automatically removed). This logic would be
> the same. Hence, there is not any need to re-invent a wheel when it comes
> to removing expired snapshots from the operator's perspective.
> 6. Such a solution would deduplicate SSTables so it would be as
> space-efficient as possible (but not as efficient as hardlinks, because of
> obvious reasons mentioned above).
>
> It seems to me that there is recently a "push" to add more logic to
> Cassandra where it was previously delegated for external toolings, for
> example CEP around automatic repairs are basically doing what external
> tooling does, we just move it under Cassandra. We would love to get rid of
> a lot of tooling and customly written logic around copying snapshot
> SSTables. From the implementation perspective it would be just plain Java,
> without any external dependencies etc. There seems to be a lot to gain for
> relatively straightforward additions to the snapshotting code.
>
> We did a serious housekeeping in CASSANDRA-18111 where we consolidated and
> centralized everything related to snapshot management so we feel
> comfortable to build logic like this on top of that. In fact,
> CASSANDRA-18111 was a prerequisite for this because we did not want to base
> this work on pre-18111 state of things when it comes to snapshots (it was
> all over the code base, fragmented and duplicated logic etc).
>
> WDYT?
>
> Regards
>

Re: [DISCUSS] Snapshots outside of Cassandra data directory

Reply via email to