Thanks for the proposal. I second Jordan that we need more abstraction in (1), 
e.g. most cloud provider allow for disk snapshots and starting nodes from a 
snapshot which would be a good mechanism if you find yourself there.

German
________________________________
From: Jordan West <jorda...@gmail.com>
Sent: Sunday, April 14, 2024 12:27 PM
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: [EXTERNAL] Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar 
for Live Migrating Instances

Thanks for proposing this CEP! We have something like this internally so I have 
some familiarity with the approach and the challenges. After reading the CEP a 
couple things come to mind:

1. I would like to see more abstraction of how the files get moved / put in 
place with the proposed solution being the default implementation. That would 
allow others to plug in alternatives means of data movement like pulling down 
backups from S3 or rsync, etc.

2. I do agree with Jon’s last email that the lifecycle / orchestration portion 
is the more challenging aspect. It would be nice to address that as well so we 
don’t end up with something like repair where the building blocks are there but 
the hard parts are left to the operator. I do, however, see that portion being 
done in a follow-on CEP to limit the scope of CEP-40 and have a higher chance 
for success by incrementally adding these features.

Jordan

On Thu, Apr 11, 2024 at 12:31 Jon Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
First off, let me apologize for my initial reply, it came off harsher than I 
had intended.

I know I didn't say it initially, but I like the idea of making it easier to 
replace a node.  I think it's probably not obvious to folks that you can use 
rsync (with stunnel, or alternatively rclone), and for a lot of teams it's 
intimidating to do so.  Whether it actually is easy or not to do with rsync is 
irrelevant.  Having tooling that does it right is better than duct taping 
things together.

So with that said, if you're looking to get feedback on how to make the CEP 
more generally useful, I have a couple thoughts.

> Managing the Cassandra processes like bringing them up or down while 
> migrating the instances.

Maybe I missed this, but I thought we already had support for managing the C* 
lifecycle with the sidecar?  Maybe I'm misremembering.  It seems to me that 
adding the ability to make this entire workflow self managed would be the 
biggest win, because having a live migrate *feature* instead of what's 
essentially a runbook would be far more useful.

> To verify whether the desired file set matches with source, only file path 
> and size is considered at the moment. Strict binary level verification is 
> deferred for later.

Scott already mentioned this is a problem and I agree, we cannot simply rely on 
file path and size.

TL;DR: I like the intention of the CEP.  I think it would be better if it 
managed the entire lifecycle of the migration, but you might not have an 
appetite to implement all that.

Jon


On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala 
<n.v.harikrishna.apa...@gmail.com<mailto:n.v.harikrishna.apa...@gmail.com>> 
wrote:
Thanks Jon & Scott for taking time to go through this CEP and providing inputs.

I am completely with what Scott had mentioned earlier (I would have added more 
details into the CEP). Adding a few more points to the same.

Having a solution with Sidecar can make the migration easy without depending on 
rsync. At least in the cases I have seen, rsync is not enabled by default and 
most of them want to run OS/images with as minimal requirements as possible. 
Installing rsync requires admin privileges and syncing data is a manual 
operation. If an API is provided with Sidecar, then tooling can be built around 
it reducing the scope for manual errors.

From performance wise, at least in the cases I had seen, the File Streaming API 
in Sidecar performs a lot better. To give an idea on the performance, I would 
like to quote "up to 7 Gbps/instance writes (depending on hardware)" from 
CEP-28 as this CEP proposes to leverage the same.

For:

>When enabled for LCS, single sstable uplevel will mutate only the level of an 
>SSTable in its stats metadata component, which wouldn't alter the filename and 
>may not alter the length of the stats metadata component. A change to the 
>level of an SSTable on the source via single sstable uplevel may not be caught 
>by a digest based only on filename and length.

In this case file size may not change, but the timestamp of last modified time 
would change, right? It is addressed in section MIGRATING ONE INSTANCE, point 
2.b.ii which says "If a file is present at the destination but did not match 
(by size or timestamp) with the source file, then local file is deleted and 
added to list of files to download.". And after download by final data copy 
task, file should match with source.

On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas 
<sc...@paradoxica.net<mailto:sc...@paradoxica.net>> wrote:
Oh, one note on this item:

>  The operator can ensure that files in the destination matches with the 
> source. In the first iteration of this feature, an API is introduced to 
> calculate digest for the list of file names and their lengths to identify any 
> mismatches. It does not validate the file contents at the binary level, but, 
> such feature can be added at a later point of time.

When enabled for LCS, single sstable uplevel will mutate only the level of an 
SSTable in its stats metadata component, which wouldn't alter the filename and 
may not alter the length of the stats metadata component. A change to the level 
of an SSTable on the source via single sstable uplevel may not be caught by a 
digest based only on filename and length.

Including the file’s modification timestamp would address this without 
requiring a deep hash of the data. This would be good to include to ensure 
SSTables aren’t downleveled unexpectedly during migration.

- Scott

On Apr 8, 2024, at 2:15 PM, C. Scott Andreas 
<sc...@paradoxica.net<mailto:sc...@paradoxica.net>> wrote:


Hi Jon,

Thanks for taking the time to read and reply to this proposal. Would encourage 
you to approach it from an attitude of seeking understanding on the part of the 
first-time CEP author, as this reply casts it off pretty quickly as NIH.

The proposal isn't mine, but I'll offer a few notes on where I see this as 
valuable:

– It's valuable for Cassandra to have an ecosystem-native mechanism of 
migrating data between physical/virtual instances outside the standard 
streaming path. As Hari mentions, the current ecosystem-native approach of 
executing repairs, decommissions, and bootstraps is time-consuming and 
cumbersome.

– An ecosystem-native solution is safer than a bunch of bash and rsync. 
Defining a safe protocol to migrate data between instances via rsync without 
downtime is surprisingly difficult - and even moreso to do safely and 
repeatedly at scale. Enabling this process to be orchestrated by a control 
plane mechanizing offical endpoints of the database and sidecar – rather than 
trying to move data around behind its back – is much safer than hoping one's 
cobbled together the right set of scripts to move data in a way that won't 
violate strong / transactional consistency guarantees. This complexity is kind 
of exemplified by the "Migrating One Instance" section of the doc and state 
machine diagram, which illustrates an approach to solving that problem.

– An ecosystem-native approach poses fewer security concerns than rsync. 
mTLS-authenticated endpoints in the sidecar for data movement eliminate the 
requirement for orchestration to occur via (typically) high-privilege SSH, 
which often allows for code execution of some form or complex efforts to scope 
SSH privileges of particular users; and eliminates the need to manage and 
secure rsyncd processes on each instance if not via SSH.

– An ecosystem-native approach is more instrumentable and measurable than 
rsync. Support for data migration endpoints in the sidecar would allow for 
metrics reporting, stats collection, and alerting via mature and modern 
mechanisms rather than monitoring the output of a shell script.

I'll yield to Hari to share more, though today is a public holiday in India.

I do see this CEP as solving an important problem.

Thanks,

– Scott

On Apr 8, 2024, at 10:23 AM, Jon Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:


This seems like a lot of work to create an rsync alternative.  I can't really 
say I see the point.  I noticed your "rejected alternatives" mentions it with 
this note:


  *   However, it might not be permitted by the administrator or available in 
various environments such as Kubernetes or virtual instances like EC2. Enabling 
data transfer through a sidecar facilitates smooth instance migration.

This feels more like NIH than solving a real problem, as what you've listed is 
a hypothetical, and one that's easily addressed.

Jon


On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala 
<n.v.harikrishna.apa...@gmail.com<mailto:n.v.harikrishna.apa...@gmail.com>> 
wrote:
Hi all,

I have filed CEP-40 [1] for live migrating Cassandra instances using the 
Cassandra Sidecar.

When someone needs to move all or a portion of the Cassandra nodes belonging to 
a cluster to different hosts, the traditional approach of Cassandra node 
replacement can be time-consuming due to repairs and the bootstrapping of new 
nodes. Depending on the volume of the storage service load, replacements 
(repair + bootstrap) may take anywhere from a few hours to days.

Proposing a Sidecar based solution to address these challenges. This solution 
proposes transferring data from the old host (source) to the new host 
(destination) and then bringing up the Cassandra process at the destination, to 
enable fast instance migration. This approach would help to minimise node 
downtime, as it is based on a Sidecar solution for data transfer and avoids 
repairs and bootstrap.

Looking forward to the discussions.

[1] 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances

Thanks!
Hari

Reply via email to