Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

Josh McKenzie Fri, 26 Apr 2024 08:48:17 -0700

> might get roasted for scope creep
This community *would never*.

What you've outlined seems like a very reasonable stretch goal or v2 to keep in 
mind so we architect something in v1 that's also supportive of a v2 keyspace 
only migration.


On Thu, Apr 25, 2024, at 1:57 PM, Venkata Hari Krishna Nukala wrote:
> I have updated the CEP to use binary level file digest verification.
> 
> In the next iteration, I am going to address the below point. 
> > I would like to see more abstraction of how the files get moved / put in 
> > place with the proposed solution being the default implementation. That 
> > would allow others to plug in alternatives means of data movement like 
> > pulling down backups from S3 or rsync, etc. 
> 
> Thanks!
> Hari
> 
> On Wed, Apr 24, 2024 at 1:24 AM Patrick McFadin <pmcfa...@gmail.com> wrote:
>> I finally got a chance to digest this CEP and am happy to see it raised. 
>> This feature has been left to the end user for far too long.
>> 
>> It might get roasted for scope creep, but here goes. Related and something 
>> that I've heard for years is the ability to migrate a single keyspace away 
>> from a set of hardware... online. Similar problem but a lot more 
>> coordination.
>>  - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
>>  - Establish replication between keyspaces and sync schema
>>  - Move data from Cluster A to B
>>  - Decommission keyspace in Cluster A
>> 
>> In many cases, multiple tenants present cause the cluster to overpressure. 
>> The best solution in that case is to migrate the largest keyspace to a 
>> dedicated cluster.
>> 
>> Live migration but a bit more complicated. No chance of doing this manually 
>> without some serious brain surgery on c* and downtime.
>> 
>> Patrick
>> 
>> 
>> On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala 
>> <n.v.harikrishna.apa...@gmail.com> wrote:
>>> Thank you all for the inputs and apologies for the late reply. I see good 
>>> points raised in this discussion. _Please allow me to reply to each point 
>>> individually._
>>> 
>>> To start with, let me focus on the point raised by Scott & Jon about file 
>>> content verification at the destination with the source in this reply. 
>>> Agree that just verifying the file name + size is not fool proof. The 
>>> reason why I called out binary level verification out of initial scope is 
>>> because of these two reasons: 1) Calculating digest for each file may 
>>> increase CPU utilisation and 2) Disk would also be under pressure as 
>>> complete disk content will also be read to calculate digest. As called out 
>>> in the discussion, I think we can't compromise on binary level check for 
>>> these two reasons. Let me update the CEP to include binary level 
>>> verification. During implementation, it can probably be made optional so 
>>> that it can be skipped if someone doesn't want it.
>>> 
>>> Thanks!
>>> Hari
>>> 
>>> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev 
>>> <dev@cassandra.apache.org> wrote:
>>>> We use backup/restore for our implementation of this concept. It has the 
>>>> added benefit that the backup / restore path gets exercised much more 
>>>> regularly than it would in normal operations, finding edge case bugs at a 
>>>> time when you still have other ways of recovering rather than in a full 
>>>> disaster scenario.____
>>>> __ __
>>>> Cheers____
>>>> Ben____
>>>> __ __
>>>> __ __
>>>> __ __
>>>> __ __
>>>> *From: *Jordan West <jorda...@gmail.com>
>>>> *Date: *Sunday, 21 April 2024 at 05:38
>>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for 
>>>> Live Migrating Instances____
>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments*____
>>>> 
>>>> ____
>>>> I do really like the framing of replacing a node is restoring a node and 
>>>> then kicking off a replace. That is effectively what we do internally. ____
>>>> __ __
>>>> I also agree we should be able to do data movement well both internal to 
>>>> Cassandra and externally for a variety of reasons. ____
>>>> __ __
>>>> We’ve seen great performance with “ZCS+TLS” even though it’s not full zero 
>>>> copy — nodes that previously took *days* to replace now take a few hours. 
>>>> But we have seen it put pressure on nodes and drive up latencies which is 
>>>> the main reason we still rely on an external data movement system by 
>>>> default — falling back to ZCS+TLS as needed. ____
>>>> __ __
>>>> Jordan ____
>>>> __ __
>>>> On Fri, Apr 19, 2024 at 19:15 Jon Haddad <j...@jonhaddad.com> wrote:____
>>>>> Jeff, this is probably the best explanation and justification of the idea 
>>>>> that I've heard so far.____
>>>>> __ __
>>>>> I like it because____
>>>>> __ __
>>>>> 1) we really should have something official for backups____
>>>>> 2) backups / object store would be great for analytics____
>>>>> 3) it solves a much bigger problem than the single goal of moving 
>>>>> instances.____
>>>>> __ __
>>>>> I'm a huge +1 in favor of this perspective, with live migration being one 
>>>>> use case for backup / restore.____
>>>>> __ __
>>>>> Jon____
>>>>> __ __
>>>>> __ __
>>>>> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa <jji...@gmail.com> wrote:____
>>>>>> I think Jordan and German had an interesting insight, or at least their 
>>>>>> comment made me think about this slightly differently, and I’m going to 
>>>>>> repeat it so it’s not lost in the discussion about zerocopy / 
>>>>>> sendfile.____
>>>>>> __ __
>>>>>> The CEP treats this as “move a live instance from one machine to 
>>>>>> another”. I know why the author wants to do this.____
>>>>>> __ __
>>>>>> If you think of it instead as “change backup/restore mechanism to be 
>>>>>> able to safely restore from a running instance”, you may end up with a 
>>>>>> cleaner abstraction that’s easier to think about (and may also be easier 
>>>>>> to generalize in clouds where you have other tools available ). ____
>>>>>> __ __
>>>>>> I’m not familiar enough with the sidecar to know the state of 
>>>>>> orchestration for backup/restore, but “ensure the original source node 
>>>>>> isn’t running” , “migrate the config”, “choose and copy a snapshot” , 
>>>>>> maybe “forcibly exclude the original instance from the cluster” are all 
>>>>>> things the restore code is going to need to do anyway, and if restore 
>>>>>> doesn’t do that today, it seems like we can solve it once. ____
>>>>>> __ __
>>>>>> Backup probably needs to be generalized to support many sources, too. 
>>>>>> Object storage is obvious (s3 download). Block storage is obvious 
>>>>>> (snapshot and reattach). Reading sstables from another sidecar seems 
>>>>>> reasonable, too.____
>>>>>> __ __
>>>>>> It accomplishes the original goal, in largely the same fashion, it just 
>>>>>> makes the logic reusable for other purposes? ____
>>>>>> __ __
>>>>>> __ __
>>>>>> __ __
>>>>>> __ __
>>>>>> 
>>>>>> ____
>>>>>>> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi <djo...@apache.org> wrote:____
>>>>>>> ____
>>>>>>> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg <ar...@weisberg.ws> 
>>>>>>> wrote:____
>>>>>>>> __ __
>>>>>>>> If there is a faster/better way to replace a node why not  have 
>>>>>>>> Cassandra support that natively without the sidecar so people who 
>>>>>>>> aren’t running the sidecar can benefit?____
>>>>>>>  ____
>>>>>>> I am not the author of the CEP so take whatever I say with a pinch of 
>>>>>>> salt. Scott and Jordan have pointed out some benefits of doing this in 
>>>>>>> the Sidecar vs Cassandra. ____
>>>>>>> __ __
>>>>>>> Today Cassandra is able to do fast node replacements. However, this CEP 
>>>>>>> is addressing an important corner case when Cassandra is unable to 
>>>>>>> start up due to old / ailing hardware. Can we fix it in Cassandra so it 
>>>>>>> doesn't die on old hardware? Sure. However, you would still need 
>>>>>>> operator intervention to start it up in some special mode both on the 
>>>>>>> old and new node so the new node can peer with the old node, copy over 
>>>>>>> its data and join the ring. This would still require some orchestration 
>>>>>>> outside the database. The Sidecar can do that orchestration for the 
>>>>>>> operator. The point I'm making here is that the CEP addresses a real 
>>>>>>> issue. The way it is currently built can improve over time with 
>>>>>>> improvements in Cassandra.____
>>>>>>> __ __
>>>>>>> Dinesh____
>>>>>>> __ __

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

Reply via email to