> might get roasted for scope creep This community *would never*. What you've outlined seems like a very reasonable stretch goal or v2 to keep in mind so we architect something in v1 that's also supportive of a v2 keyspace only migration.
On Thu, Apr 25, 2024, at 1:57 PM, Venkata Hari Krishna Nukala wrote: > I have updated the CEP to use binary level file digest verification. > > In the next iteration, I am going to address the below point. > > I would like to see more abstraction of how the files get moved / put in > > place with the proposed solution being the default implementation. That > > would allow others to plug in alternatives means of data movement like > > pulling down backups from S3 or rsync, etc. > > Thanks! > Hari > > On Wed, Apr 24, 2024 at 1:24 AM Patrick McFadin <pmcfa...@gmail.com> wrote: >> I finally got a chance to digest this CEP and am happy to see it raised. >> This feature has been left to the end user for far too long. >> >> It might get roasted for scope creep, but here goes. Related and something >> that I've heard for years is the ability to migrate a single keyspace away >> from a set of hardware... online. Similar problem but a lot more >> coordination. >> - Create a Keyspace in Cluster B mimicking keyspace in Cluster A >> - Establish replication between keyspaces and sync schema >> - Move data from Cluster A to B >> - Decommission keyspace in Cluster A >> >> In many cases, multiple tenants present cause the cluster to overpressure. >> The best solution in that case is to migrate the largest keyspace to a >> dedicated cluster. >> >> Live migration but a bit more complicated. No chance of doing this manually >> without some serious brain surgery on c* and downtime. >> >> Patrick >> >> >> On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala >> <n.v.harikrishna.apa...@gmail.com> wrote: >>> Thank you all for the inputs and apologies for the late reply. I see good >>> points raised in this discussion. _Please allow me to reply to each point >>> individually._ >>> >>> To start with, let me focus on the point raised by Scott & Jon about file >>> content verification at the destination with the source in this reply. >>> Agree that just verifying the file name + size is not fool proof. The >>> reason why I called out binary level verification out of initial scope is >>> because of these two reasons: 1) Calculating digest for each file may >>> increase CPU utilisation and 2) Disk would also be under pressure as >>> complete disk content will also be read to calculate digest. As called out >>> in the discussion, I think we can't compromise on binary level check for >>> these two reasons. Let me update the CEP to include binary level >>> verification. During implementation, it can probably be made optional so >>> that it can be skipped if someone doesn't want it. >>> >>> Thanks! >>> Hari >>> >>> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev >>> <dev@cassandra.apache.org> wrote: >>>> We use backup/restore for our implementation of this concept. It has the >>>> added benefit that the backup / restore path gets exercised much more >>>> regularly than it would in normal operations, finding edge case bugs at a >>>> time when you still have other ways of recovering rather than in a full >>>> disaster scenario.____ >>>> __ __ >>>> Cheers____ >>>> Ben____ >>>> __ __ >>>> __ __ >>>> __ __ >>>> __ __ >>>> *From: *Jordan West <jorda...@gmail.com> >>>> *Date: *Sunday, 21 April 2024 at 05:38 >>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for >>>> Live Migrating Instances____ >>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments*____ >>>> >>>> ____ >>>> I do really like the framing of replacing a node is restoring a node and >>>> then kicking off a replace. That is effectively what we do internally. ____ >>>> __ __ >>>> I also agree we should be able to do data movement well both internal to >>>> Cassandra and externally for a variety of reasons. ____ >>>> __ __ >>>> We’ve seen great performance with “ZCS+TLS” even though it’s not full zero >>>> copy — nodes that previously took *days* to replace now take a few hours. >>>> But we have seen it put pressure on nodes and drive up latencies which is >>>> the main reason we still rely on an external data movement system by >>>> default — falling back to ZCS+TLS as needed. ____ >>>> __ __ >>>> Jordan ____ >>>> __ __ >>>> On Fri, Apr 19, 2024 at 19:15 Jon Haddad <j...@jonhaddad.com> wrote:____ >>>>> Jeff, this is probably the best explanation and justification of the idea >>>>> that I've heard so far.____ >>>>> __ __ >>>>> I like it because____ >>>>> __ __ >>>>> 1) we really should have something official for backups____ >>>>> 2) backups / object store would be great for analytics____ >>>>> 3) it solves a much bigger problem than the single goal of moving >>>>> instances.____ >>>>> __ __ >>>>> I'm a huge +1 in favor of this perspective, with live migration being one >>>>> use case for backup / restore.____ >>>>> __ __ >>>>> Jon____ >>>>> __ __ >>>>> __ __ >>>>> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa <jji...@gmail.com> wrote:____ >>>>>> I think Jordan and German had an interesting insight, or at least their >>>>>> comment made me think about this slightly differently, and I’m going to >>>>>> repeat it so it’s not lost in the discussion about zerocopy / >>>>>> sendfile.____ >>>>>> __ __ >>>>>> The CEP treats this as “move a live instance from one machine to >>>>>> another”. I know why the author wants to do this.____ >>>>>> __ __ >>>>>> If you think of it instead as “change backup/restore mechanism to be >>>>>> able to safely restore from a running instance”, you may end up with a >>>>>> cleaner abstraction that’s easier to think about (and may also be easier >>>>>> to generalize in clouds where you have other tools available ). ____ >>>>>> __ __ >>>>>> I’m not familiar enough with the sidecar to know the state of >>>>>> orchestration for backup/restore, but “ensure the original source node >>>>>> isn’t running” , “migrate the config”, “choose and copy a snapshot” , >>>>>> maybe “forcibly exclude the original instance from the cluster” are all >>>>>> things the restore code is going to need to do anyway, and if restore >>>>>> doesn’t do that today, it seems like we can solve it once. ____ >>>>>> __ __ >>>>>> Backup probably needs to be generalized to support many sources, too. >>>>>> Object storage is obvious (s3 download). Block storage is obvious >>>>>> (snapshot and reattach). Reading sstables from another sidecar seems >>>>>> reasonable, too.____ >>>>>> __ __ >>>>>> It accomplishes the original goal, in largely the same fashion, it just >>>>>> makes the logic reusable for other purposes? ____ >>>>>> __ __ >>>>>> __ __ >>>>>> __ __ >>>>>> __ __ >>>>>> >>>>>> ____ >>>>>>> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi <djo...@apache.org> wrote:____ >>>>>>> ____ >>>>>>> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg <ar...@weisberg.ws> >>>>>>> wrote:____ >>>>>>>> __ __ >>>>>>>> If there is a faster/better way to replace a node why not have >>>>>>>> Cassandra support that natively without the sidecar so people who >>>>>>>> aren’t running the sidecar can benefit?____ >>>>>>> ____ >>>>>>> I am not the author of the CEP so take whatever I say with a pinch of >>>>>>> salt. Scott and Jordan have pointed out some benefits of doing this in >>>>>>> the Sidecar vs Cassandra. ____ >>>>>>> __ __ >>>>>>> Today Cassandra is able to do fast node replacements. However, this CEP >>>>>>> is addressing an important corner case when Cassandra is unable to >>>>>>> start up due to old / ailing hardware. Can we fix it in Cassandra so it >>>>>>> doesn't die on old hardware? Sure. However, you would still need >>>>>>> operator intervention to start it up in some special mode both on the >>>>>>> old and new node so the new node can peer with the old node, copy over >>>>>>> its data and join the ring. This would still require some orchestration >>>>>>> outside the database. The Sidecar can do that orchestration for the >>>>>>> operator. The point I'm making here is that the CEP addresses a real >>>>>>> issue. The way it is currently built can improve over time with >>>>>>> improvements in Cassandra.____ >>>>>>> __ __ >>>>>>> Dinesh____ >>>>>>> __ __