Like others have said, I was expecting the scheduling portion of repair is negligible. I was mostly curious if you had something handy that you can quickly share.
On 2024/10/21 18:59:41 Jaydeep Chovatia wrote: > >Jaydeep, do you have any metrics on your clusters comparing them before > and after introducing repair scheduling into the Cassandra process? > > Yes, I had made some comparisons when I started rolling this feature out to > our production five years ago :) Here are the details: > *The Scheduling* > The scheduling itself is exceptionally lightweight, as only one additional > thread monitors the repair activity, updating the status to a system table > once every few minutes or so. So, it does not appear anywhere in the CPU > charts, etc. Unfortunately, I do not have those graphs now, but I can do a > quick comparison if it helps! > > *The Repair Itself* > As we all know, the Cassandra repair algorithm is a heavy-weight process > due to Merkle tree/streaming, etc., no matter how we schedule it. But it is > an orthogonal topic and folks are already discussing creating a new CEP. > > Jaydeep > > > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero <fran...@apache.org> > wrote: > > > Jaydeep, do you have any metrics on your clusters comparing them before > > and after introducing repair scheduling into the Cassandra process? > > > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote: > > > Sounds good. Just wanted to bring it up. I agree that the scheduling bit > > is > > > pretty light weight and the ideal would be to bring the whole of the > > repair > > > external, which is a much bigger can of worms to open. > > > > > > > > > > > > -Jeremiah > > > > > > > > > > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink <clohfin...@gmail.com> > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > I actually think we should be looking at how we can move things out > > of the > > > > database process. > > > > > > > > > > > > > > > > > > > > > > > > While worth pursuing, I think we would need a different CEP just to > > figure > > > > out how to do that. Not only is there a lot of infrastructure > > difficulty in > > > > running multi process, the inter app communication needs to be figured > > out > > > > better then JMX. Even the sidecar we dont have a solid story on how to > > > > ensure both are running or anything yet. It's up to each app owner to > > figure > > > > it out. Once we have a good thing in place I think we can start moving > > > > compactions, repairs, etc out of the database. Even then it's the > > _repairs_ > > > > that is expensive, not the scheduling. > > > > > > > > > > > > > > > > > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan > > > > <[jeremiah.jor...@gmail.com](mailto:jeremiah.jor...@gmail.com)> > > wrote: > > > > > > > > > > > > > > >> I love the idea of a repair service being there by default for an > > install > > > of C*. My main concern here is that it is putting more services into > > the main > > > database process. I actually think we should be looking at how we can > > move > > > things out of the database process. The C* process being a giant > > monolith has > > > always been a pain point. Is there anyway it makes sense for this to be > > an > > > external process rather than a new thread pool inside the C* process? > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> -Jeremiah Jordan > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever > > > <[m...@apache.org](mailto:m...@apache.org)> wrote: > > > > > > > >> > > > > > > >>> > > > > > > > >>> > > > > > > >>> This is looking strong, thanks Jaydeep. > > > > > > >>> > > > > > > >>> > > > > > > > >>> > > > > > > >>> I would suggest folk take a look at the design doc and the PR in the > > CEP. > > > A lot is there (that I have completely missed). > > > > > > >>> > > > > > > >>> > > > > > > > >>> > > > > > > >>> I would especially ask all authors of prior art (Reaper, DSE > > nodesync, > > > ecchronos) to take a final review of the proposal > > > > > > > >>> > > > > > > >>> > > > > > > > >>> > > > > > > >>> Jaydeep, can we ask for a two week window while we reach out to these > > > people ? There's a lot of prior art in this space, and it feels like > > we're in > > > a good place now where it's clear this has legs and we can use that to > > bring > > > folk in and make sure there's no remaining blindspots. > > > > > > >>> > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > > >>> > > > > > > >>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> > > wrote: > > > > > > > >>> > > > > > > >>>> Sorry, there is a typo in the CEP-37 link; here is the correct > > > [link]( > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution > > ) > > > > > > >>>> > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> > > wrote: > > > > > > > >>>> > > > > > > >>>>> First, thank you for your patience while we strengthened the > > CEP-37. > > > > > > >>>>> > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh > > McKenzie, > > > Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online > > > discussions/a dedicated Slack channel > > #cassandra-repair-scheduling-cep37) to > > > come up with the best possible design that not only significantly > > simplifies > > > repair operations but also includes the most common features that > > everyone > > > will benefit from running at Scale. > > > > > > >>>>> > > > > > > >>>>> For example, > > > > > > >>>>> > > > > > > >>>>> * Apache Cassandra must be capable of running multiple repair > > types, > > > such as Full, Incremental, Paxos, and Preview - so the framework should > > be > > > easily extendable with no additional overhead from the operator’s point > > of > > > view. > > > > > > >>>>> > > > > > > >>>>> * An easy way to extend the token-split calculation algorithm > > with a > > > default implementation should exist. > > > > > > >>>>> > > > > > > >>>>> * Running incremental repair reliably at Scale is pretty > > challenging, > > > so we need to place safeguards, such as migration/rollback w/o restart > > and > > > stopping incremental repair automatically if the disk is about to get > > full. > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> We are glad to inform you that CEP-37 (i.e., Repair inside > > Cassandra) is > > > now officially ready for review after multiple rounds of design, > > testing, code > > > reviews, documentation reviews, and, more importantly, validation that > > it runs > > > at Scale! > > > > > > >>>>> > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>>> Some facts about CEP-37. > > > > > > >>>>> > > > > > > >>>>> * Multiple members have verified all aspects of CEP-37 numerous > > times. > > > > > > >>>>> > > > > > > >>>>> * The design proposed in CEP-37 has been thoroughly tried and > > tested > > > on an immense scale (hundreds of unique Cassandra clusters, tens of > > thousands > > > of Cassandra nodes, with tens of millions of QPS) on top of 4.1 > > open-source > > > for more than five years; please see more details[ > > > here]( > > https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations- > > > at-scale/). > > > > > > >>>>> > > > > > > >>>>> * The following > > > [presentation]( > > https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13 > > ) > > > highlights the rigorous applied to CEP-37, which was given during last > > week’s > > > Apache Cassandra Bay Area [Meetup]( > > https://www.meetup.com/apache-cassandra- > > > bay-area/events/303469006/), > > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > >>>>> > > > > > > >>>>> Since things are massively overhauled, we believe it is almost > > ready for > > > a final pass pre-VOTE. We would like you to please review the > > > [CEP-37]( > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution\ > > <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution%5C> > > )) > > > and the associated detailed design > > > [doc](https://docs.google.com/document/d/1CJWxjEi- > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0). > > > > > > >>>>> > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>>> Thank you everyone! > > > > > > >>>>> > > > > > > >>>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep > > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > >>>>> > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie > > > <[jmcken...@apache.org](mailto:jmcken...@apache.org)> wrote: > > > > > > > >>>>> > > > > > > >>>>>> __ > > > > > > >>>>>> > > > > > > >>>>>> Not quite; finishing touches on the CEP and design doc are in > > flight > > > (as of last / this week). > > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > >>>>>> Soon(tm). > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > >>>>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote: > > > > > > > >>>>>> > > > > > > >>>>>>> Is this CEP ready for a VOTE thread? > > > < > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution> > > > > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > > >>>>>>> > > > > > > >>>>>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> > > wrote: > > > > > > > >>>>>>> > > > > > > >>>>>>>> Thanks, Josh. I've just updated the > > > [CEP]( > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution > > ) > > > and included all the solutions you mentioned below. > > > > > > > >>>>>>>> > > > > > > >>>>>>>> > > > > > > > >>>>>>>> > > > > > > >>>>>>>> Jaydeep > > > > > > > >>>>>>>> > > > > > > >>>>>>>> > > > > > > > >>>>>>>> > > > > > > >>>>>>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie > > > <[jmcken...@apache.org](mailto:jmcken...@apache.org)> wrote: > > > > > > > >>>>>>>> > > > > > > >>>>>>>>> __ > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Very late response from me here (basically necro'ing this > > thread). > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> I think it'd be useful to get this condensed into a CEP that > > we can > > > then discuss in that format. It's clearly something we all agree we need > > and > > > having an implementation that works, even if it's not in your preferred > > > execution domain, is vastly better than nothing IMO. > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> I don't have cycles (nor background ;) ) to do that, but it > > sounds > > > like you do Jaydeep given the implementation you have on a private fork + > > > design. > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> A non-exhaustive list of things that might be useful > > incorporating > > > into or referencing from a CEP: > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Slack thread: <https://the- > > > asf.slack.com/archives/CK23JSY2K/p1690225062383619> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Joey's old C* ticket: > > > <https://issues.apache.org/jira/browse/CASSANDRA-14346> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Even older automatic repair scheduling: > > > <https://issues.apache.org/jira/browse/CASSANDRA-10070> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Your design gdoc: < > > https://docs.google.com/document/d/1CJWxjEi- > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> PR with automated repair: > > > < > > https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c> > > > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> My intuition is that we're all basically in agreement that > > this is > > > something the DB needs, we're all willing to bikeshed for our personal > > > preference on where it lives and how it's implemented, and at the end of > > the > > > day, code talks. I don't think anyone's said they'll die on the hill of > > > implementation details, so that feels like CEP time to me. > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> If you were willing and able to get a CEP together for > > automated > > > repair based on the above material, given you've done the work and have > > the > > > proof points it's working at scale, I think this would be a _huge > > > contribution_ to the community. > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote: > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>>> Is anyone going to file an official CEP for this? > > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> As mentioned in this email thread, here is one of the > > solution's > > > [design doc](https://docs.google.com/document/d/1CJWxjEi- > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0) and > > source > > > code on a private Apache Cassandra patch. Could you go through it and > > let me > > > know what you think? > > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> Jaydeep > > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad > > > <[rustyrazorbl...@apache.org](mailto:rustyrazorbl...@apache.org)> > > wrote: > > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>>> > That said I would happily support an effort to bring repair > > > scheduling to the sidecar immediately. This has nothing blocking it, and > > would > > > potentially enable the sidecar to provide an official repair scheduling > > > solution that is compatible with current or even previous versions of the > > > database. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> This is something I hadn't thought much about, and is a > > pretty > > > good argument for using the sidecar initially. There's a lot of > > deployments > > > out there and having an official repair option would be a big win. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote: > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > I agree that it would be ideal for Cassandra to have a > > repair > > > scheduler in-DB. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > That said I would happily support an effort to bring repair > > > scheduling to the sidecar immediately. This has nothing blocking it, and > > would > > > potentially enable the sidecar to provide an official repair scheduling > > > solution that is compatible with current or even previous versions of the > > > database. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > Once TCM has landed, we’ll have much stronger primitives > > for > > > repair orchestration in the database itself. But I don’t think that > > should > > > block progress on a repair scheduling solution in the sidecar, and there > > is > > > nothing that would prevent someone from continuing to use a sidecar-based > > > solution in perpetuity if they preferred. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > \- Scott > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad > > > <[rustyrazorbl...@apache.org](mailto:rustyrazorbl...@apache.org)> > > wrote: > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > I'm 100% in favor of repair being part of the core DB, > > not > > > the sidecar. The current (and past) state of things where running the DB > > > correctly *requires* running a separate process (either community > > maintained > > > or official C* sidecar) is incredibly painful for folks. The idea that > > your > > > data integrity needs to be opt-in has never made sense to me from the > > > perspective of either the product or the end user. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > I've worked with way too many teams that have either > > > configured this incorrectly or not at all. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > Ideally Cassandra would ship with repair built in and on > > by > > > default. Power users can disable if they want to continue to maintain > > their > > > own repair tooling for some reason. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > Jon > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > >> On 2023/07/24 20:44:14 German Eichberger via dev > > wrote: > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > >> All, > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > >> We had a brief discussion in [2] about the Uber article > > [1] > > > where they talk about having integrated repair into Cassandra and how > > great > > > that is. I expressed my disappointment that they didn't work with the > > > community on that (Uber, if you are listening time to make amends 🙂) > > and it > > > turns out Joey already had the idea and wrote the code [3] - so I wanted > > to > > > start a discussion to gauge interest and maybe how to revive that > > effort. > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > >> Thanks, > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > >> German > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > >> [1] < > > https://www.uber.com/blog/how-uber-optimized-cassandra- > > > operations-at-scale/> > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > >> [2] <https://the- > > > asf.slack.com/archives/CK23JSY2K/p1690225062383619> > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > >> [3] < > > https://issues.apache.org/jira/browse/CASSANDRA-14346> > > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > > > > > > > >