I think this solution would solve one of the problems that Aiven has with node replacement currently. Though TCM will probably help as well.
On Mon, Apr 15, 2024 at 11:47 PM German Eichberger via dev < dev@cassandra.apache.org> wrote: > Thanks for the proposal. I second Jordan that we need more abstraction in > (1), e.g. most cloud provider allow for disk snapshots and starting nodes > from a snapshot which would be a good mechanism if you find yourself there. > > German > ------------------------------ > *From:* Jordan West <jorda...@gmail.com> > *Sent:* Sunday, April 14, 2024 12:27 PM > *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> > *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra > Sidecar for Live Migrating Instances > > Thanks for proposing this CEP! We have something like this internally so I > have some familiarity with the approach and the challenges. After reading > the CEP a couple things come to mind: > > 1. I would like to see more abstraction of how the files get moved / put > in place with the proposed solution being the default implementation. That > would allow others to plug in alternatives means of data movement like > pulling down backups from S3 or rsync, etc. > > 2. I do agree with Jon’s last email that the lifecycle / orchestration > portion is the more challenging aspect. It would be nice to address that as > well so we don’t end up with something like repair where the building > blocks are there but the hard parts are left to the operator. I do, > however, see that portion being done in a follow-on CEP to limit the scope > of CEP-40 and have a higher chance for success by incrementally adding > these features. > > Jordan > > On Thu, Apr 11, 2024 at 12:31 Jon Haddad <j...@jonhaddad.com> wrote: > > First off, let me apologize for my initial reply, it came off harsher than > I had intended. > > I know I didn't say it initially, but I like the idea of making it easier > to replace a node. I think it's probably not obvious to folks that you can > use rsync (with stunnel, or alternatively rclone), and for a lot of teams > it's intimidating to do so. Whether it actually is easy or not to do with > rsync is irrelevant. Having tooling that does it right is better than duct > taping things together. > > So with that said, if you're looking to get feedback on how to make the > CEP more generally useful, I have a couple thoughts. > > > Managing the Cassandra processes like bringing them up or down while > migrating the instances. > > Maybe I missed this, but I thought we already had support for managing the > C* lifecycle with the sidecar? Maybe I'm misremembering. It seems to me > that adding the ability to make this entire workflow self managed would be > the biggest win, because having a live migrate *feature* instead of what's > essentially a runbook would be far more useful. > > > To verify whether the desired file set matches with source, only file > path and size is considered at the moment. Strict binary level verification > is deferred for later. > > Scott already mentioned this is a problem and I agree, we cannot simply > rely on file path and size. > > TL;DR: I like the intention of the CEP. I think it would be better if it > managed the entire lifecycle of the migration, but you might not have an > appetite to implement all that. > > Jon > > > On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala < > n.v.harikrishna.apa...@gmail.com> wrote: > > Thanks Jon & Scott for taking time to go through this CEP and providing > inputs. > > I am completely with what Scott had mentioned earlier (I would have added > more details into the CEP). Adding a few more points to the same. > > Having a solution with Sidecar can make the migration easy without > depending on rsync. At least in the cases I have seen, rsync is not enabled > by default and most of them want to run OS/images with as minimal > requirements as possible. Installing rsync requires admin privileges and > syncing data is a manual operation. If an API is provided with Sidecar, > then tooling can be built around it reducing the scope for manual errors. > > From performance wise, at least in the cases I had seen, the File > Streaming API in Sidecar performs a lot better. To give an idea on the > performance, I would like to quote "up to 7 Gbps/instance writes (depending > on hardware)" from CEP-28 as this CEP proposes to leverage the same. > > For: > > >When enabled for LCS, single sstable uplevel will mutate only the level > of an SSTable in its stats metadata component, which wouldn't alter the > filename and may not alter the length of the stats metadata component. A > change to the level of an SSTable on the source via single sstable uplevel > may not be caught by a digest based only on filename and length. > > In this case file size may not change, but the timestamp of last modified > time would change, right? It is addressed in section MIGRATING ONE > INSTANCE, point 2.b.ii which says "If a file is present at the destination > but did not match (by size or timestamp) with the source file, then local > file is deleted and added to list of files to download.". And after > download by final data copy task, file should match with source. > > On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas <sc...@paradoxica.net> > wrote: > > Oh, one note on this item: > > > The operator can ensure that files in the destination matches with the > source. In the first iteration of this feature, an API is introduced to > calculate digest for the list of file names and their lengths to identify > any mismatches. It does not validate the file contents at the binary level, > but, such feature can be added at a later point of time. > > When enabled for LCS, single sstable uplevel will mutate only the level of > an SSTable in its stats metadata component, which wouldn't alter the > filename and may not alter the length of the stats metadata component. A > change to the level of an SSTable on the source via single sstable uplevel > may not be caught by a digest based only on filename and length. > > Including the file’s modification timestamp would address this without > requiring a deep hash of the data. This would be good to include to ensure > SSTables aren’t downleveled unexpectedly during migration. > > - Scott > > On Apr 8, 2024, at 2:15 PM, C. Scott Andreas <sc...@paradoxica.net> wrote: > > > Hi Jon, > > Thanks for taking the time to read and reply to this proposal. Would > encourage you to approach it from an attitude of seeking understanding on > the part of the first-time CEP author, as this reply casts it off pretty > quickly as NIH. > > The proposal isn't mine, but I'll offer a few notes on where I see this as > valuable: > > – It's valuable for Cassandra to have an ecosystem-native mechanism of > migrating data between physical/virtual instances outside the standard > streaming path. As Hari mentions, the current ecosystem-native approach of > executing repairs, decommissions, and bootstraps is time-consuming and > cumbersome. > > – An ecosystem-native solution is safer than a bunch of bash and rsync. > Defining a safe protocol to migrate data between instances via rsync > without downtime is surprisingly difficult - and even moreso to do safely > and repeatedly at scale. Enabling this process to be orchestrated by a > control plane mechanizing offical endpoints of the database and sidecar – > rather than trying to move data around behind its back – is much safer than > hoping one's cobbled together the right set of scripts to move data in a > way that won't violate strong / transactional consistency guarantees. This > complexity is kind of exemplified by the "Migrating One Instance" section > of the doc and state machine diagram, which illustrates an approach to > solving that problem. > > – An ecosystem-native approach poses fewer security concerns than rsync. > mTLS-authenticated endpoints in the sidecar for data movement eliminate the > requirement for orchestration to occur via (typically) high-privilege SSH, > which often allows for code execution of some form or complex efforts to > scope SSH privileges of particular users; and eliminates the need to manage > and secure rsyncd processes on each instance if not via SSH. > > – An ecosystem-native approach is more instrumentable and measurable than > rsync. Support for data migration endpoints in the sidecar would allow for > metrics reporting, stats collection, and alerting via mature and modern > mechanisms rather than monitoring the output of a shell script. > > I'll yield to Hari to share more, though today is a public holiday in > India. > > I do see this CEP as solving an important problem. > > Thanks, > > – Scott > > On Apr 8, 2024, at 10:23 AM, Jon Haddad <j...@jonhaddad.com> wrote: > > > This seems like a lot of work to create an rsync alternative. I can't > really say I see the point. I noticed your "rejected alternatives" > mentions it with this note: > > > - However, it might not be permitted by the administrator or available > in various environments such as Kubernetes or virtual instances like EC2. > Enabling data transfer through a sidecar facilitates smooth instance > migration. > > This feels more like NIH than solving a real problem, as what you've > listed is a hypothetical, and one that's easily addressed. > > Jon > > > > On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala < > n.v.harikrishna.apa...@gmail.com> wrote: > > Hi all, > > I have filed CEP-40 [1] for live migrating Cassandra instances using the > Cassandra Sidecar. > > When someone needs to move all or a portion of the Cassandra nodes > belonging to a cluster to different hosts, the traditional approach of > Cassandra node replacement can be time-consuming due to repairs and the > bootstrapping of new nodes. Depending on the volume of the storage service > load, replacements (repair + bootstrap) may take anywhere from a few hours > to days. > > Proposing a Sidecar based solution to address these challenges. This > solution proposes transferring data from the old host (source) to the new > host (destination) and then bringing up the Cassandra process at the > destination, to enable fast instance migration. This approach would help to > minimise node downtime, as it is based on a Sidecar solution for data > transfer and avoids repairs and bootstrap. > > Looking forward to the discussions. > > [1] > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances > > Thanks! > Hari > > >