I have updated the CEP to use binary level file digest verification.

In the next iteration, I am going to address the below point.
> I would like to see more abstraction of how the files get moved / put in
place with the proposed solution being the default implementation. That
would allow others to plug in alternatives means of data movement like
pulling down backups from S3 or rsync, etc.

Thanks!
Hari

On Wed, Apr 24, 2024 at 1:24 AM Patrick McFadin <pmcfa...@gmail.com> wrote:

> I finally got a chance to digest this CEP and am happy to see it raised.
> This feature has been left to the end user for far too long.
>
> It might get roasted for scope creep, but here goes. Related and something
> that I've heard for years is the ability to migrate a single keyspace away
> from a set of hardware... online. Similar problem but a lot more
> coordination.
>  - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
>  - Establish replication between keyspaces and sync schema
>  - Move data from Cluster A to B
>  - Decommission keyspace in Cluster A
>
> In many cases, multiple tenants present cause the cluster to overpressure.
> The best solution in that case is to migrate the largest keyspace to a
> dedicated cluster.
>
> Live migration but a bit more complicated. No chance of doing this
> manually without some serious brain surgery on c* and downtime.
>
> Patrick
>
>
> On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
> n.v.harikrishna.apa...@gmail.com> wrote:
>
>> Thank you all for the inputs and apologies for the late reply. I see good
>> points raised in this discussion. *Please allow me to reply to each
>> point individually.*
>>
>> To start with, let me focus on the point raised by Scott & Jon about file
>> content verification at the destination with the source in this reply.
>> Agree that just verifying the file name + size is not fool proof. The
>> reason why I called out binary level verification out of initial scope is
>> because of these two reasons: 1) Calculating digest for each file may
>> increase CPU utilisation and 2) Disk would also be under pressure as
>> complete disk content will also be read to calculate digest. As called out
>> in the discussion, I think we can't compromise on binary level check for
>> these two reasons. Let me update the CEP to include binary level
>> verification. During implementation, it can probably be made optional so
>> that it can be skipped if someone doesn't want it.
>>
>> Thanks!
>> Hari
>>
>> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> We use backup/restore for our implementation of this concept. It has the
>>> added benefit that the backup / restore path gets exercised much more
>>> regularly than it would in normal operations, finding edge case bugs at a
>>> time when you still have other ways of recovering rather than in a full
>>> disaster scenario.
>>>
>>>
>>>
>>> Cheers
>>>
>>> Ben
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From: *Jordan West <jorda...@gmail.com>
>>> *Date: *Sunday, 21 April 2024 at 05:38
>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org>
>>> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar
>>> for Live Migrating Instances
>>>
>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>
>>>
>>>
>>> I do really like the framing of replacing a node is restoring a node and
>>> then kicking off a replace. That is effectively what we do internally.
>>>
>>>
>>>
>>> I also agree we should be able to do data movement well both internal to
>>> Cassandra and externally for a variety of reasons.
>>>
>>>
>>>
>>> We’ve seen great performance with “ZCS+TLS” even though it’s not full
>>> zero copy — nodes that previously took *days* to replace now take a few
>>> hours. But we have seen it put pressure on nodes and drive up latencies
>>> which is the main reason we still rely on an external data movement system
>>> by default — falling back to ZCS+TLS as needed.
>>>
>>>
>>>
>>> Jordan
>>>
>>>
>>>
>>> On Fri, Apr 19, 2024 at 19:15 Jon Haddad <j...@jonhaddad.com> wrote:
>>>
>>> Jeff, this is probably the best explanation and justification of the
>>> idea that I've heard so far.
>>>
>>>
>>>
>>> I like it because
>>>
>>>
>>>
>>> 1) we really should have something official for backups
>>>
>>> 2) backups / object store would be great for analytics
>>>
>>> 3) it solves a much bigger problem than the single goal of moving
>>> instances.
>>>
>>>
>>>
>>> I'm a huge +1 in favor of this perspective, with live migration being
>>> one use case for backup / restore.
>>>
>>>
>>>
>>> Jon
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>> I think Jordan and German had an interesting insight, or at least their
>>> comment made me think about this slightly differently, and I’m going to
>>> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>>>
>>>
>>>
>>> The CEP treats this as “move a live instance from one machine to
>>> another”. I know why the author wants to do this.
>>>
>>>
>>>
>>> If you think of it instead as “change backup/restore mechanism to be
>>> able to safely restore from a running instance”, you may end up with a
>>> cleaner abstraction that’s easier to think about (and may also be easier to
>>> generalize in clouds where you have other tools available ).
>>>
>>>
>>>
>>> I’m not familiar enough with the sidecar to know the state of
>>> orchestration for backup/restore, but “ensure the original source node
>>> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
>>> “forcibly exclude the original instance from the cluster” are all things
>>> the restore code is going to need to do anyway, and if restore doesn’t do
>>> that today, it seems like we can solve it once.
>>>
>>>
>>>
>>> Backup probably needs to be generalized to support many sources, too.
>>> Object storage is obvious (s3 download). Block storage is obvious (snapshot
>>> and reattach). Reading sstables from another sidecar seems reasonable, too.
>>>
>>>
>>>
>>> It accomplishes the original goal, in largely the same fashion, it just
>>> makes the logic reusable for other purposes?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi <djo...@apache.org> wrote:
>>>
>>> 
>>>
>>> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg <ar...@weisberg.ws>
>>> wrote:
>>>
>>>
>>>
>>> If there is a faster/better way to replace a node why not  have
>>> Cassandra support that natively without the sidecar so people who aren’t
>>> running the sidecar can benefit?
>>>
>>>
>>>
>>> I am not the author of the CEP so take whatever I say with a pinch of
>>> salt. Scott and Jordan have pointed out some benefits of doing this in the
>>> Sidecar vs Cassandra.
>>>
>>>
>>>
>>> Today Cassandra is able to do fast node replacements. However, this CEP
>>> is addressing an important corner case when Cassandra is unable to start up
>>> due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die
>>> on old hardware? Sure. However, you would still need operator intervention
>>> to start it up in some special mode both on the old and new node so the new
>>> node can peer with the old node, copy over its data and join the ring. This
>>> would still require some orchestration outside the database. The Sidecar
>>> can do that orchestration for the operator. The point I'm making here is
>>> that the CEP addresses a real issue. The way it is currently built can
>>> improve over time with improvements in Cassandra.
>>>
>>>
>>>
>>> Dinesh
>>>
>>>
>>>
>>>

Reply via email to