Re: [Discuss] Repair inside C*

Francisco Guerrero Mon, 21 Oct 2024 12:25:40 -0700

Like others have said, I was expecting the scheduling portion of repair is
negligible. I was mostly curious if you had something handy that you can
quickly share.


On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
> >Jaydeep, do you have any metrics on your clusters comparing them before
> and after introducing repair scheduling into the Cassandra process?
> 
> Yes, I had made some comparisons when I started rolling this feature out to
> our production five years ago :)  Here are the details:
> *The Scheduling*
> The scheduling itself is exceptionally lightweight, as only one additional
> thread monitors the repair activity, updating the status to a system table
> once every few minutes or so. So, it does not appear anywhere in the CPU
> charts, etc. Unfortunately, I do not have those graphs now, but I can do a
> quick comparison if it helps!
> 
> *The Repair Itself*
> As we all know, the Cassandra repair algorithm is a heavy-weight process
> due to Merkle tree/streaming, etc., no matter how we schedule it. But it is
> an orthogonal topic and folks are already discussing creating a new CEP.
> 
> Jaydeep
> 
> 
> On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero <[email protected]>
> wrote:
> 
> > Jaydeep, do you have any metrics on your clusters comparing them before
> > and after introducing repair scheduling into the Cassandra process?
> >
> > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> > > Sounds good. Just wanted to bring it up. I agree that the scheduling bit
> > is
> > > pretty light weight and the ideal would be to bring the whole of the
> > repair
> > > external, which is a much bigger can of worms to open.
> > >
> > >
> > >
> > > -Jeremiah
> > >
> > >
> > >
> > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink <[email protected]>
> > wrote:
> > > >
> > > >
> > >
> > > > 
> > > >
> > > > > I actually think we should be looking at how we can move things out
> > of the
> > > > database process.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > While worth pursuing, I think we would need a different CEP just to
> > figure
> > > > out how to do that. Not only is there a lot of infrastructure
> > difficulty in
> > > > running multi process, the inter app communication needs to be figured
> > out
> > > > better then JMX. Even the sidecar we dont have a solid story on how to
> > > > ensure both are running or anything yet. It's up to each app owner to
> > figure
> > > > it out. Once we have a good thing in place I think we can start moving
> > > > compactions, repairs, etc out of the database. Even then it's the
> > _repairs_
> > > > that is expensive, not the scheduling.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >
> > >
> > > >> I love the idea of a repair service being there by default for an
> > install
> > > of C*.  My main concern here is that it is putting more services into
> > the main
> > > database process.  I actually think we should be looking at how we can
> > move
> > > things out of the database process.  The C* process being a giant
> > monolith has
> > > always been a pain point.  Is there anyway it makes sense for this to be
> > an
> > > external process rather than a new thread pool inside the C* process?
> > >
> > > >>
> > >
> > > >>
> > > >
> > > >>
> > >
> > > >> -Jeremiah Jordan
> > >
> > > >>
> > >
> > > >>
> > > >
> > > >>
> > >
> > > >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever
> > > <[[email protected]](mailto:[email protected])> wrote:
> > > >
> > > >>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> This is looking strong, thanks Jaydeep.
> > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> I would suggest folk take a look at the design doc and the PR in the
> > CEP.
> > > A lot is there (that I have completely missed).
> > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> I would especially ask all authors of prior art (Reaper, DSE
> > nodesync,
> > > ecchronos)  to take a final review of the proposal
> > > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> Jaydeep, can we ask for a two week window while we reach out to these
> > > people ?  There's a lot of prior art in this space, and it feels like
> > we're in
> > > a good place now where it's clear this has legs and we can use that to
> > bring
> > > folk in and make sure there's no remaining blindspots.
> > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia
> > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >>>
> > >
> > > >>>> Sorry, there is a typo in the CEP-37 link; here is the correct
> > > [link](
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution
> > )
> > >
> > > >>>>
> > >
> > > >>>>
> > > >
> > > >>>>
> > >
> > > >>>>
> > > >
> > > >>>>
> > >
> > > >>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia
> > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >>>>
> > >
> > > >>>>> First, thank you for your patience while we strengthened the
> > CEP-37.
> > >
> > > >>>>>
> > >
> > > >>>>>
> > > >
> > > >>>>>
> > >
> > > >>>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh
> > McKenzie,
> > > Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
> > > discussions/a dedicated Slack channel
> > #cassandra-repair-scheduling-cep37) to
> > > come up with the best possible design that not only significantly
> > simplifies
> > > repair operations but also includes the most common features that
> > everyone
> > > will benefit from running at Scale.
> > >
> > > >>>>>
> > >
> > > >>>>> For example,
> > >
> > > >>>>>
> > >
> > > >>>>>   * Apache Cassandra must be capable of running multiple repair
> > types,
> > > such as Full, Incremental, Paxos, and Preview - so the framework should
> > be
> > > easily extendable with no additional overhead from the operator’s point
> > of
> > > view.
> > >
> > > >>>>>
> > >
> > > >>>>>   * An easy way to extend the token-split calculation algorithm
> > with a
> > > default implementation should exist.
> > >
> > > >>>>>
> > >
> > > >>>>>   * Running incremental repair reliably at Scale is pretty
> > challenging,
> > > so we need to place safeguards, such as migration/rollback w/o restart
> > and
> > > stopping incremental repair automatically if the disk is about to get
> > full.
> > >
> > > >>>>>
> > >
> > > >>>>>
> > >
> > > >>>>>
> > >
> > > >>>>> We are glad to inform you that CEP-37 (i.e., Repair inside
> > Cassandra) is
> > > now officially ready for review after multiple rounds of design,
> > testing, code
> > > reviews, documentation reviews, and, more importantly, validation that
> > it runs
> > > at Scale!
> > >
> > > >>>>>
> > >
> > > >>>>>
> > > >
> > > >>>>>
> > >
> > > >>>>> Some facts about CEP-37.
> > >
> > > >>>>>
> > >
> > > >>>>>   * Multiple members have verified all aspects of CEP-37 numerous
> > times.
> > >
> > > >>>>>
> > >
> > > >>>>>   * The design proposed in CEP-37 has been thoroughly tried and
> > tested
> > > on an immense scale (hundreds of unique Cassandra clusters, tens of
> > thousands
> > > of Cassandra nodes, with tens of millions of QPS) on top of 4.1
> > open-source
> > > for more than five years; please see more details[
> > > here](
> > https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations-
> > > at-scale/).
> > >
> > > >>>>>
> > >
> > > >>>>>   * The following
> > > [presentation](
> > https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13
> > )
> > > highlights the rigorous applied to CEP-37, which was given during last
> > week’s
> > > Apache Cassandra Bay Area [Meetup](
> > https://www.meetup.com/apache-cassandra-
> > > bay-area/events/303469006/),
> > >
> > > >>>>>
> > >
> > > >>>>>
> > >
> > >
> > > >
> > > >>>>>
> > >
> > > >>>>> Since things are massively overhauled, we believe it is almost
> > ready for
> > > a final pass pre-VOTE. We would like you to please review the
> > > [CEP-37](
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution\
> > <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution%5C>
> > ))
> > > and the associated detailed design
> > > [doc](https://docs.google.com/document/d/1CJWxjEi-
> > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0).
> > >
> > > >>>>>
> > >
> > > >>>>>
> > > >
> > > >>>>>
> > >
> > > >>>>> Thank you everyone!
> > >
> > > >>>>>
> > >
> > > >>>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
> > >
> > > >>>>>
> > >
> > > >>>>>
> > > >
> > > >
> > > >>>>>
> > >
> > > >>>>>
> > > >
> > > >>>>>
> > >
> > > >>>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie
> > > <[[email protected]](mailto:[email protected])> wrote:
> > > >
> > > >>>>>
> > >
> > > >>>>>>  __
> > >
> > > >>>>>>
> > >
> > > >>>>>> Not quite; finishing touches on the CEP and design doc are in
> > flight
> > > (as of last / this week).
> > > >
> > > >>>>>>
> > >
> > > >>>>>>
> > > >
> > > >>>>>>
> > >
> > > >>>>>> Soon(tm).
> > >
> > > >>>>>>
> > >
> > > >>>>>>
> > > >
> > > >>>>>>
> > >
> > > >>>>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
> > > >
> > > >>>>>>
> > >
> > > >>>>>>> Is this CEP ready for a VOTE thread?
> > > <
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution>
> >
> > > >
> > > >>>>>>>
> > >
> > > >>>>>>>
> > > >
> > > >>>>>>>
> > >
> > > >>>>>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia
> > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >>>>>>>
> > >
> > > >>>>>>>> Thanks, Josh. I've just updated the
> > > [CEP](
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution
> > )
> > > and included all the solutions you mentioned below.
> > > >
> > > >>>>>>>>
> > >
> > > >>>>>>>>
> > > >
> > > >>>>>>>>
> > >
> > > >>>>>>>> Jaydeep
> > > >
> > > >>>>>>>>
> > >
> > > >>>>>>>>
> > > >
> > > >>>>>>>>
> > >
> > > >>>>>>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie
> > > <[[email protected]](mailto:[email protected])> wrote:
> > > >
> > > >>>>>>>>
> > >
> > > >>>>>>>>>  __
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> Very late response from me here (basically necro'ing this
> > thread).
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> I think it'd be useful to get this condensed into a CEP that
> > we can
> > > then discuss in that format. It's clearly something we all agree we need
> > and
> > > having an implementation that works, even if it's not in your preferred
> > > execution domain, is vastly better than nothing IMO.
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> I don't have cycles (nor background ;) ) to do that, but it
> > sounds
> > > like you do Jaydeep given the implementation you have on a private fork +
> > > design.
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> A non-exhaustive list of things that might be useful
> > incorporating
> > > into or referencing from a CEP:
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> Slack thread: <https://the-
> > > asf.slack.com/archives/CK23JSY2K/p1690225062383619>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> Joey's old C* ticket:
> > > <https://issues.apache.org/jira/browse/CASSANDRA-14346>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> Even older automatic repair scheduling:
> > > <https://issues.apache.org/jira/browse/CASSANDRA-10070>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> Your design gdoc: <
> > https://docs.google.com/document/d/1CJWxjEi-
> > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> PR with automated repair:
> > > <
> > https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c>
> >
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> My intuition is that we're all basically in agreement that
> > this is
> > > something the DB needs, we're all willing to bikeshed for our personal
> > > preference on where it lives and how it's implemented, and at the end of
> > the
> > > day, code talks. I don't think anyone's said they'll die on the hill of
> > > implementation details, so that feels like CEP time to me.
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> If you were willing and able to get a CEP together for
> > automated
> > > repair based on the above material, given you've done the work and have
> > the
> > > proof points it's working at scale, I think this would be a  _huge
> > > contribution_ to the community.
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>>
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>>> Is anyone going to file an official CEP for this?
> > > >
> > > >>>>>>>>>>
> > >
> > > >>>>>>>>>> As mentioned in this email thread, here is one of the
> > solution's
> > > [design doc](https://docs.google.com/document/d/1CJWxjEi-
> > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0) and
> > source
> > > code on a private Apache Cassandra patch. Could you go through it and
> > let me
> > > know what you think?
> > > >
> > > >>>>>>>>>>
> > >
> > > >>>>>>>>>>
> > > >
> > > >>>>>>>>>>
> > >
> > > >>>>>>>>>> Jaydeep
> > > >
> > > >>>>>>>>>>
> > >
> > > >>>>>>>>>>
> > > >
> > > >>>>>>>>>>
> > >
> > > >>>>>>>>>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad
> > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >>>>>>>>>>
> > >
> > > >>>>>>>>>>> > That said I would happily support an effort to bring repair
> > > scheduling to the sidecar immediately. This has nothing blocking it, and
> > would
> > > potentially enable the sidecar to provide an official repair scheduling
> > > solution that is compatible with current or even previous versions of the
> > > database.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>>
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> This is something I hadn't thought much about, and is a
> > pretty
> > > good argument for using the sidecar initially.  There's a lot of
> > deployments
> > > out there and having an official repair option would be a big win.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>>
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>>
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > I agree that it would be ideal for Cassandra to have a
> > repair
> > > scheduler in-DB.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > That said I would happily support an effort to bring repair
> > > scheduling to the sidecar immediately. This has nothing blocking it, and
> > would
> > > potentially enable the sidecar to provide an official repair scheduling
> > > solution that is compatible with current or even previous versions of the
> > > database.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > Once TCM has landed, we’ll have much stronger primitives
> > for
> > > repair orchestration in the database itself. But I don’t think that
> > should
> > > block progress on a repair scheduling solution in the sidecar, and there
> > is
> > > nothing that would prevent someone from continuing to use a sidecar-based
> > > solution in perpetuity if they preferred.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > \- Scott
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad
> > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > > I'm 100% in favor of repair being part of the core DB,
> > not
> > > the sidecar.  The current (and past) state of things where running the DB
> > > correctly *requires* running a separate process (either community
> > maintained
> > > or official C* sidecar) is incredibly painful for folks.  The idea that
> > your
> > > data integrity needs to be opt-in has never made sense to me from the
> > > perspective of either the product or the end user.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > > I've worked with way too many teams that have either
> > > configured this incorrectly or not at all.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > > Ideally Cassandra would ship with repair built in and on
> > by
> > > default.  Power users can disable if they want to continue to maintain
> > their
> > > own repair tooling for some reason.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > > Jon
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >> On 2023/07/24 20:44:14 German Eichberger via dev
> > wrote:
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >> All,
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >> We had a brief discussion in [2] about the Uber article
> > [1]
> > > where they talk about having integrated repair into Cassandra and how
> > great
> > > that is. I expressed my disappointment that they didn't work with the
> > > community on that (Uber, if you are listening time to make amends 🙂)
> > and it
> > > turns out Joey already had the idea and wrote the code [3] - so I wanted
> > to
> > > start a discussion to gauge interest and maybe how to revive that
> > effort.
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >> Thanks,
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >> German
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >> [1] <
> > https://www.uber.com/blog/how-uber-optimized-cassandra-
> > > operations-at-scale/>
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >> [2] <https://the-
> > > asf.slack.com/archives/CK23JSY2K/p1690225062383619>
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> > >> [3] <
> > https://issues.apache.org/jira/browse/CASSANDRA-14346>
> > > >
> > > >>>>>>>>>>>
> > >
> > > >>>>>>>>>>> >
> > > >
> > > >>>>>>>>>
> > >
> > > >>>>>>>>>
> > > >
> > > >>>>>>
> > >
> > > >>>>>>
> > > >
> > >
> > >
> >
>

Re: [Discuss] Repair inside C*

Reply via email to