Definitely like this in C* itself. We only changed our proposal to putting repair scheduling in the sidecar before because trunk was frozen for the foreseeable future at that time. With trunk unfrozen and development on the main process going at a fast pace I think it makes way more sense to integrate natively as table properties as this CEP proposes. Completely agree the scheduling overhead should be minimal.
Moving the actual repair operation (comparing data and streaming mismatches) along with compaction operations to a separate process long term makes a lot of sense but imo only once we both have a release of sidecar and a contract figured out between them on communication. I'm watching CEP-38 there as I think CQL and virtual tables are looking much stronger than when we wrote CEP-1 and chose HTTP but that's for that discussion and not this one. -Joey On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero <fran...@apache.org> wrote: > Like others have said, I was expecting the scheduling portion of repair is > negligible. I was mostly curious if you had something handy that you can > quickly share. > > On 2024/10/21 18:59:41 Jaydeep Chovatia wrote: > > >Jaydeep, do you have any metrics on your clusters comparing them before > > and after introducing repair scheduling into the Cassandra process? > > > > Yes, I had made some comparisons when I started rolling this feature out > to > > our production five years ago :) Here are the details: > > *The Scheduling* > > The scheduling itself is exceptionally lightweight, as only one > additional > > thread monitors the repair activity, updating the status to a system > table > > once every few minutes or so. So, it does not appear anywhere in the CPU > > charts, etc. Unfortunately, I do not have those graphs now, but I can do > a > > quick comparison if it helps! > > > > *The Repair Itself* > > As we all know, the Cassandra repair algorithm is a heavy-weight process > > due to Merkle tree/streaming, etc., no matter how we schedule it. But it > is > > an orthogonal topic and folks are already discussing creating a new CEP. > > > > Jaydeep > > > > > > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero <fran...@apache.org> > > wrote: > > > > > Jaydeep, do you have any metrics on your clusters comparing them before > > > and after introducing repair scheduling into the Cassandra process? > > > > > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote: > > > > Sounds good. Just wanted to bring it up. I agree that the scheduling > bit > > > is > > > > pretty light weight and the ideal would be to bring the whole of the > > > repair > > > > external, which is a much bigger can of worms to open. > > > > > > > > > > > > > > > > -Jeremiah > > > > > > > > > > > > > > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink <clohfin...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I actually think we should be looking at how we can move things > out > > > of the > > > > > database process. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > While worth pursuing, I think we would need a different CEP just to > > > figure > > > > > out how to do that. Not only is there a lot of infrastructure > > > difficulty in > > > > > running multi process, the inter app communication needs to be > figured > > > out > > > > > better then JMX. Even the sidecar we dont have a solid story on > how to > > > > > ensure both are running or anything yet. It's up to each app owner > to > > > figure > > > > > it out. Once we have a good thing in place I think we can start > moving > > > > > compactions, repairs, etc out of the database. Even then it's the > > > _repairs_ > > > > > that is expensive, not the scheduling. > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan > > > > > <[jeremiah.jor...@gmail.com](mailto:jeremiah.jor...@gmail.com)> > > > wrote: > > > > > > > > > > > > > > > > > > >> I love the idea of a repair service being there by default for an > > > install > > > > of C*. My main concern here is that it is putting more services into > > > the main > > > > database process. I actually think we should be looking at how we > can > > > move > > > > things out of the database process. The C* process being a giant > > > monolith has > > > > always been a pain point. Is there anyway it makes sense for this > to be > > > an > > > > external process rather than a new thread pool inside the C* process? > > > > > > > > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> -Jeremiah Jordan > > > > > > > > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever > > > > <[m...@apache.org](mailto:m...@apache.org)> wrote: > > > > > > > > > >> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> This is looking strong, thanks Jaydeep. > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> I would suggest folk take a look at the design doc and the PR in > the > > > CEP. > > > > A lot is there (that I have completely missed). > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> I would especially ask all authors of prior art (Reaper, DSE > > > nodesync, > > > > ecchronos) to take a final review of the proposal > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> Jaydeep, can we ask for a two week window while we reach out to > these > > > > people ? There's a lot of prior art in this space, and it feels like > > > we're in > > > > a good place now where it's clear this has legs and we can use that > to > > > bring > > > > folk in and make sure there's no remaining blindspots. > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia > > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> > > > wrote: > > > > > > > > > >>> > > > > > > > > >>>> Sorry, there is a typo in the CEP-37 link; here is the correct > > > > [link]( > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution > > > ) > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia > > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> > > > wrote: > > > > > > > > > >>>> > > > > > > > > >>>>> First, thank you for your patience while we strengthened the > > > CEP-37. > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > >>>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh > > > McKenzie, > > > > Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online > > > > discussions/a dedicated Slack channel > > > #cassandra-repair-scheduling-cep37) to > > > > come up with the best possible design that not only significantly > > > simplifies > > > > repair operations but also includes the most common features that > > > everyone > > > > will benefit from running at Scale. > > > > > > > > >>>>> > > > > > > > > >>>>> For example, > > > > > > > > >>>>> > > > > > > > > >>>>> * Apache Cassandra must be capable of running multiple repair > > > types, > > > > such as Full, Incremental, Paxos, and Preview - so the framework > should > > > be > > > > easily extendable with no additional overhead from the operator’s > point > > > of > > > > view. > > > > > > > > >>>>> > > > > > > > > >>>>> * An easy way to extend the token-split calculation algorithm > > > with a > > > > default implementation should exist. > > > > > > > > >>>>> > > > > > > > > >>>>> * Running incremental repair reliably at Scale is pretty > > > challenging, > > > > so we need to place safeguards, such as migration/rollback w/o > restart > > > and > > > > stopping incremental repair automatically if the disk is about to get > > > full. > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > >>>>> We are glad to inform you that CEP-37 (i.e., Repair inside > > > Cassandra) is > > > > now officially ready for review after multiple rounds of design, > > > testing, code > > > > reviews, documentation reviews, and, more importantly, validation > that > > > it runs > > > > at Scale! > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > >>>>> Some facts about CEP-37. > > > > > > > > >>>>> > > > > > > > > >>>>> * Multiple members have verified all aspects of CEP-37 > numerous > > > times. > > > > > > > > >>>>> > > > > > > > > >>>>> * The design proposed in CEP-37 has been thoroughly tried and > > > tested > > > > on an immense scale (hundreds of unique Cassandra clusters, tens of > > > thousands > > > > of Cassandra nodes, with tens of millions of QPS) on top of 4.1 > > > open-source > > > > for more than five years; please see more details[ > > > > here]( > > > > https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations- > > > > at-scale/). > > > > > > > > >>>>> > > > > > > > > >>>>> * The following > > > > [presentation]( > > > > https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13 > > > ) > > > > highlights the rigorous applied to CEP-37, which was given during > last > > > week’s > > > > Apache Cassandra Bay Area [Meetup]( > > > https://www.meetup.com/apache-cassandra- > > > > bay-area/events/303469006/), > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > >>>>> Since things are massively overhauled, we believe it is almost > > > ready for > > > > a final pass pre-VOTE. We would like you to please review the > > > > [CEP-37]( > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution\ > <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution%5C> > > > < > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution%5C > > > > > )) > > > > and the associated detailed design > > > > [doc](https://docs.google.com/document/d/1CJWxjEi- > > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0). > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > >>>>> Thank you everyone! > > > > > > > > >>>>> > > > > > > > > >>>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > >>>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie > > > > <[jmcken...@apache.org](mailto:jmcken...@apache.org)> wrote: > > > > > > > > > >>>>> > > > > > > > > >>>>>> __ > > > > > > > > >>>>>> > > > > > > > > >>>>>> Not quite; finishing touches on the CEP and design doc are in > > > flight > > > > (as of last / this week). > > > > > > > > > >>>>>> > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > >>>>>> Soon(tm). > > > > > > > > >>>>>> > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > >>>>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote: > > > > > > > > > >>>>>> > > > > > > > > >>>>>>> Is this CEP ready for a VOTE thread? > > > > < > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution > > > > > > > > > > > > > > >>>>>>> > > > > > > > > >>>>>>> > > > > > > > > > >>>>>>> > > > > > > > > >>>>>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia > > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> > > > wrote: > > > > > > > > > >>>>>>> > > > > > > > > >>>>>>>> Thanks, Josh. I've just updated the > > > > [CEP]( > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution > > > ) > > > > and included all the solutions you mentioned below. > > > > > > > > > >>>>>>>> > > > > > > > > >>>>>>>> > > > > > > > > > >>>>>>>> > > > > > > > > >>>>>>>> Jaydeep > > > > > > > > > >>>>>>>> > > > > > > > > >>>>>>>> > > > > > > > > > >>>>>>>> > > > > > > > > >>>>>>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie > > > > <[jmcken...@apache.org](mailto:jmcken...@apache.org)> wrote: > > > > > > > > > >>>>>>>> > > > > > > > > >>>>>>>>> __ > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> Very late response from me here (basically necro'ing this > > > thread). > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> I think it'd be useful to get this condensed into a CEP > that > > > we can > > > > then discuss in that format. It's clearly something we all agree we > need > > > and > > > > having an implementation that works, even if it's not in your > preferred > > > > execution domain, is vastly better than nothing IMO. > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> I don't have cycles (nor background ;) ) to do that, but it > > > sounds > > > > like you do Jaydeep given the implementation you have on a private > fork + > > > > design. > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> A non-exhaustive list of things that might be useful > > > incorporating > > > > into or referencing from a CEP: > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> Slack thread: <https://the- > > > > asf.slack.com/archives/CK23JSY2K/p1690225062383619> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> Joey's old C* ticket: > > > > <https://issues.apache.org/jira/browse/CASSANDRA-14346> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> Even older automatic repair scheduling: > > > > <https://issues.apache.org/jira/browse/CASSANDRA-10070> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> Your design gdoc: < > > > https://docs.google.com/document/d/1CJWxjEi- > > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> PR with automated repair: > > > > < > > > > https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c > > > > > > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> My intuition is that we're all basically in agreement that > > > this is > > > > something the DB needs, we're all willing to bikeshed for our > personal > > > > preference on where it lives and how it's implemented, and at the > end of > > > the > > > > day, code talks. I don't think anyone's said they'll die on the hill > of > > > > implementation details, so that feels like CEP time to me. > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> If you were willing and able to get a CEP together for > > > automated > > > > repair based on the above material, given you've done the work and > have > > > the > > > > proof points it's working at scale, I think this would be a _huge > > > > contribution_ to the community. > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote: > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>>> Is anyone going to file an official CEP for this? > > > > > > > > > >>>>>>>>>> > > > > > > > > >>>>>>>>>> As mentioned in this email thread, here is one of the > > > solution's > > > > [design doc](https://docs.google.com/document/d/1CJWxjEi- > > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0) and > > > source > > > > code on a private Apache Cassandra patch. Could you go through it and > > > let me > > > > know what you think? > > > > > > > > > >>>>>>>>>> > > > > > > > > >>>>>>>>>> > > > > > > > > > >>>>>>>>>> > > > > > > > > >>>>>>>>>> Jaydeep > > > > > > > > > >>>>>>>>>> > > > > > > > > >>>>>>>>>> > > > > > > > > > >>>>>>>>>> > > > > > > > > >>>>>>>>>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad > > > > <[rustyrazorbl...@apache.org](mailto:rustyrazorbl...@apache.org)> > > > wrote: > > > > > > > > > >>>>>>>>>> > > > > > > > > >>>>>>>>>>> > That said I would happily support an effort to bring > repair > > > > scheduling to the sidecar immediately. This has nothing blocking it, > and > > > would > > > > potentially enable the sidecar to provide an official repair > scheduling > > > > solution that is compatible with current or even previous versions > of the > > > > database. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> This is something I hadn't thought much about, and is a > > > pretty > > > > good argument for using the sidecar initially. There's a lot of > > > deployments > > > > out there and having an official repair option would be a big win. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote: > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > I agree that it would be ideal for Cassandra to have a > > > repair > > > > scheduler in-DB. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > That said I would happily support an effort to bring > repair > > > > scheduling to the sidecar immediately. This has nothing blocking it, > and > > > would > > > > potentially enable the sidecar to provide an official repair > scheduling > > > > solution that is compatible with current or even previous versions > of the > > > > database. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > Once TCM has landed, we’ll have much stronger > primitives > > > for > > > > repair orchestration in the database itself. But I don’t think that > > > should > > > > block progress on a repair scheduling solution in the sidecar, and > there > > > is > > > > nothing that would prevent someone from continuing to use a > sidecar-based > > > > solution in perpetuity if they preferred. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > \- Scott > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad > > > > <[rustyrazorbl...@apache.org](mailto:rustyrazorbl...@apache.org)> > > > wrote: > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > I'm 100% in favor of repair being part of the core > DB, > > > not > > > > the sidecar. The current (and past) state of things where running > the DB > > > > correctly *requires* running a separate process (either community > > > maintained > > > > or official C* sidecar) is incredibly painful for folks. The idea > that > > > your > > > > data integrity needs to be opt-in has never made sense to me from the > > > > perspective of either the product or the end user. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > I've worked with way too many teams that have either > > > > configured this incorrectly or not at all. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > Ideally Cassandra would ship with repair built in > and on > > > by > > > > default. Power users can disable if they want to continue to > maintain > > > their > > > > own repair tooling for some reason. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > Jon > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > >> On 2023/07/24 20:44:14 German Eichberger via dev > > > wrote: > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > >> All, > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > >> We had a brief discussion in [2] about the Uber > article > > > [1] > > > > where they talk about having integrated repair into Cassandra and how > > > great > > > > that is. I expressed my disappointment that they didn't work with the > > > > community on that (Uber, if you are listening time to make amends 🙂) > > > and it > > > > turns out Joey already had the idea and wrote the code [3] - so I > wanted > > > to > > > > start a discussion to gauge interest and maybe how to revive that > > > effort. > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > >> Thanks, > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > >> German > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > >> [1] < > > > https://www.uber.com/blog/how-uber-optimized-cassandra- > > > > operations-at-scale/> > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > >> [2] <https://the- > > > > asf.slack.com/archives/CK23JSY2K/p1690225062383619> > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > >> [3] < > > > https://issues.apache.org/jira/browse/CASSANDRA-14346> > > > > > > > > > >>>>>>>>>>> > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > >>>>>>>>> > > > > > > > > > >>>>>> > > > > > > > > >>>>>> > > > > > > > > > > > > > > > > > > >