Re: [Discuss] Repair inside C*

Chris Lohfink Mon, 21 Oct 2024 09:21:22 -0700

> I actually think we should be looking at how we can move things out of
the database process.


While worth pursuing, I think we would need a different CEP just to figure
out how to do that. Not only is there a lot of infrastructure difficulty in
running multi process, the inter app communication needs to be figured out
better then JMX. Even the sidecar we dont have a solid story on how to
ensure both are running or anything yet. It's up to each app owner to
figure it out. Once we have a good thing in place I think we can start
moving compactions, repairs, etc out of the database. Even then it's the
_repairs_ that is expensive, not the scheduling.

On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan <[email protected]>
wrote:

> I love the idea of a repair service being there by default for an install
> of C*.  My main concern here is that it is putting more services into the
> main database process.  I actually think we should be looking at how we can
> move things out of the database process.  The C* process being a giant
> monolith has always been a pain point.  Is there anyway it makes sense for
> this to be an external process rather than a new thread pool inside the C*
> process?
>
> -Jeremiah Jordan
>
> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever <[email protected]> wrote:
>
>>
>> This is looking strong, thanks Jaydeep.
>>
>> I would suggest folk take a look at the design doc and the PR in the
>> CEP.  A lot is there (that I have completely missed).
>>
>> I would especially ask all authors of prior art (Reaper, DSE nodesync,
>> ecchronos)  to take a final review of the proposal
>>
>> Jaydeep, can we ask for a two week window while we reach out to these
>> people ?  There's a lot of prior art in this space, and it feels like we're
>> in a good place now where it's clear this has legs and we can use that to
>> bring folk in and make sure there's no remaining blindspots.
>>
>>
>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia <
>> [email protected]> wrote:
>>
>>> Sorry, there is a typo in the CEP-37 link; here is the correct link
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution>
>>>
>>>
>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia <
>>> [email protected]> wrote:
>>>
>>>> First, thank you for your patience while we strengthened the CEP-37.
>>>>
>>>>
>>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
>>>> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
>>>> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37)
>>>> to come up with the best possible design that not only significantly
>>>> simplifies repair operations but also includes the most common features
>>>> that everyone will benefit from running at Scale.
>>>>
>>>> For example,
>>>>
>>>>    -
>>>>
>>>>    Apache Cassandra must be capable of running multiple repair types,
>>>>    such as Full, Incremental, Paxos, and Preview - so the framework should 
>>>> be
>>>>    easily extendable with no additional overhead from the operator’s point 
>>>> of
>>>>    view.
>>>>    -
>>>>
>>>>    An easy way to extend the token-split calculation algorithm with a
>>>>    default implementation should exist.
>>>>    -
>>>>
>>>>    Running incremental repair reliably at Scale is pretty challenging,
>>>>    so we need to place safeguards, such as migration/rollback w/o restart 
>>>> and
>>>>    stopping incremental repair automatically if the disk is about to get 
>>>> full.
>>>>
>>>> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra)
>>>> is now officially ready for review after multiple rounds of design,
>>>> testing, code reviews, documentation reviews, and, more importantly,
>>>> validation that it runs at Scale!
>>>>
>>>>
>>>> Some facts about CEP-37.
>>>>
>>>>    -
>>>>
>>>>    Multiple members have verified all aspects of CEP-37 numerous times.
>>>>    -
>>>>
>>>>    The design proposed in CEP-37 has been thoroughly tried and tested
>>>>    on an immense scale (hundreds of unique Cassandra clusters, tens of
>>>>    thousands of Cassandra nodes, with tens of millions of QPS) on top of 
>>>> 4.1
>>>>    open-source for more than five years; please see more details here
>>>>    
>>>> <https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations-at-scale/>
>>>>    .
>>>>    -
>>>>
>>>>    The following presentation
>>>>    
>>>> <https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13>
>>>>    highlights the rigorous applied to CEP-37, which was given during last
>>>>    week’s Apache Cassandra Bay Area Meetup
>>>>    <https://www.meetup.com/apache-cassandra-bay-area/events/303469006/>
>>>>    ,
>>>>
>>>>
>>>> Since things are massively overhauled, we believe it is almost ready
>>>> for a final pass pre-VOTE. We would like you to please review the
>>>> CEP-37
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution)>
>>>> and the associated detailed design doc
>>>> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>
>>>> .
>>>>
>>>> Thank you everyone!
>>>>
>>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
>>>>
>>>>
>>>>
>>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie <[email protected]>
>>>> wrote:
>>>>
>>>>> Not quite; finishing touches on the CEP and design doc are in flight
>>>>> (as of last / this week).
>>>>>
>>>>> Soon(tm).
>>>>>
>>>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
>>>>>
>>>>> Is this CEP ready for a VOTE thread?
>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
>>>>>
>>>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Thanks, Josh. I've just updated the CEP
>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution>
>>>>> and included all the solutions you mentioned below.
>>>>>
>>>>> Jaydeep
>>>>>
>>>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie <[email protected]>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Very late response from me here (basically necro'ing this thread).
>>>>>
>>>>> I think it'd be useful to get this condensed into a CEP that we can
>>>>> then discuss in that format. It's clearly something we all agree we need
>>>>> and having an implementation that works, even if it's not in your 
>>>>> preferred
>>>>> execution domain, is vastly better than nothing IMO.
>>>>>
>>>>> I don't have cycles (nor background ;) ) to do that, but it sounds
>>>>> like you do Jaydeep given the implementation you have on a private fork +
>>>>> design.
>>>>>
>>>>> A non-exhaustive list of things that might be useful incorporating
>>>>> into or referencing from a CEP:
>>>>> Slack thread:
>>>>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>>>> Joey's old C* ticket:
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-14346
>>>>> Even older automatic repair scheduling:
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>>>>> Your design gdoc:
>>>>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>>>>> PR with automated repair:
>>>>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>>>>
>>>>> My intuition is that we're all basically in agreement that this is
>>>>> something the DB needs, we're all willing to bikeshed for our personal
>>>>> preference on where it lives and how it's implemented, and at the end of
>>>>> the day, code talks. I don't think anyone's said they'll die on the hill 
>>>>> of
>>>>> implementation details, so that feels like CEP time to me.
>>>>>
>>>>> If you were willing and able to get a CEP together for automated
>>>>> repair based on the above material, given you've done the work and have 
>>>>> the
>>>>> proof points it's working at scale, I think this would be a *huge
>>>>> contribution* to the community.
>>>>>
>>>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>>>>>
>>>>> Is anyone going to file an official CEP for this?
>>>>> As mentioned in this email thread, here is one of the solution's design
>>>>> doc
>>>>> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>
>>>>> and source code on a private Apache Cassandra patch. Could you go through
>>>>> it and let me know what you think?
>>>>>
>>>>> Jaydeep
>>>>>
>>>>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad <[email protected]>
>>>>> wrote:
>>>>>
>>>>> > That said I would happily support an effort to bring repair
>>>>> scheduling to the sidecar immediately. This has nothing blocking it, and
>>>>> would potentially enable the sidecar to provide an official repair
>>>>> scheduling solution that is compatible with current or even previous
>>>>> versions of the database.
>>>>>
>>>>> This is something I hadn't thought much about, and is a pretty good
>>>>> argument for using the sidecar initially.  There's a lot of deployments 
>>>>> out
>>>>> there and having an official repair option would be a big win.
>>>>>
>>>>>
>>>>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>>>>> > I agree that it would be ideal for Cassandra to have a repair
>>>>> scheduler in-DB.
>>>>> >
>>>>> > That said I would happily support an effort to bring repair
>>>>> scheduling to the sidecar immediately. This has nothing blocking it, and
>>>>> would potentially enable the sidecar to provide an official repair
>>>>> scheduling solution that is compatible with current or even previous
>>>>> versions of the database.
>>>>> >
>>>>> > Once TCM has landed, we’ll have much stronger primitives for repair
>>>>> orchestration in the database itself. But I don’t think that should block
>>>>> progress on a repair scheduling solution in the sidecar, and there is
>>>>> nothing that would prevent someone from continuing to use a sidecar-based
>>>>> solution in perpetuity if they preferred.
>>>>> >
>>>>> > - Scott
>>>>> >
>>>>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad <
>>>>> [email protected]> wrote:
>>>>> > >
>>>>> > > I'm 100% in favor of repair being part of the core DB, not the
>>>>> sidecar.  The current (and past) state of things where running the DB
>>>>> correctly *requires* running a separate process (either community
>>>>> maintained or official C* sidecar) is incredibly painful for folks.  The
>>>>> idea that your data integrity needs to be opt-in has never made sense to 
>>>>> me
>>>>> from the perspective of either the product or the end user.
>>>>> > >
>>>>> > > I've worked with way too many teams that have either configured
>>>>> this incorrectly or not at all.
>>>>> > >
>>>>> > > Ideally Cassandra would ship with repair built in and on by
>>>>> default.  Power users can disable if they want to continue to maintain
>>>>> their own repair tooling for some reason.
>>>>> > >
>>>>> > > Jon
>>>>> > >
>>>>> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>>>>> > >> All,
>>>>> > >> We had a brief discussion in [2] about the Uber article [1] where
>>>>> they talk about having integrated repair into Cassandra and how great that
>>>>> is. I expressed my disappointment that they didn't work with the community
>>>>> on that (Uber, if you are listening time to make amends 🙂) and it turns
>>>>> out Joey already had the idea and wrote the code [3] - so I wanted to 
>>>>> start
>>>>> a discussion to gauge interest and maybe how to revive that effort.
>>>>> > >> Thanks,
>>>>> > >> German
>>>>> > >> [1]
>>>>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>>>>> > >> [2]
>>>>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>>>> > >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>>

Re: [Discuss] Repair inside C*

Reply via email to