Agreed with the sentiment that decomposition is a good target but out of scope here. I’m personally excited to see an in-tree repair scheduler and am supportive of the approach shared here.
Jordan On Tue, Oct 22, 2024 at 08:12 Dinesh Joshi <djo...@apache.org> wrote: > Decomposing Cassandra may be architecturally desirable but that is not the > goal of this CEP. This CEP brings value to operators today so it should be > considered on that merit. We definitely need to have a separate > conversation on Cassandra's architectural direction. > > On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch <joe.e.ly...@gmail.com> > wrote: > >> Definitely like this in C* itself. We only changed our proposal to >> putting repair scheduling in the sidecar before because trunk was frozen >> for the foreseeable future at that time. With trunk unfrozen and >> development on the main process going at a fast pace I think it makes way >> more sense to integrate natively as table properties as this CEP proposes. >> Completely agree the scheduling overhead should be minimal. >> >> Moving the actual repair operation (comparing data and streaming >> mismatches) along with compaction operations to a separate process long >> term makes a lot of sense but imo only once we both have a release of >> sidecar and a contract figured out between them on communication. I'm >> watching CEP-38 there as I think CQL and virtual tables are looking much >> stronger than when we wrote CEP-1 and chose HTTP but that's for that >> discussion and not this one. >> >> -Joey >> >> On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero <fran...@apache.org> >> wrote: >> >>> Like others have said, I was expecting the scheduling portion of repair >>> is >>> negligible. I was mostly curious if you had something handy that you can >>> quickly share. >>> >>> On 2024/10/21 18:59:41 Jaydeep Chovatia wrote: >>> > >Jaydeep, do you have any metrics on your clusters comparing them >>> before >>> > and after introducing repair scheduling into the Cassandra process? >>> > >>> > Yes, I had made some comparisons when I started rolling this feature >>> out to >>> > our production five years ago :) Here are the details: >>> > *The Scheduling* >>> > The scheduling itself is exceptionally lightweight, as only one >>> additional >>> > thread monitors the repair activity, updating the status to a system >>> table >>> > once every few minutes or so. So, it does not appear anywhere in the >>> CPU >>> > charts, etc. Unfortunately, I do not have those graphs now, but I can >>> do a >>> > quick comparison if it helps! >>> > >>> > *The Repair Itself* >>> > As we all know, the Cassandra repair algorithm is a heavy-weight >>> process >>> > due to Merkle tree/streaming, etc., no matter how we schedule it. But >>> it is >>> > an orthogonal topic and folks are already discussing creating a new >>> CEP. >>> > >>> > Jaydeep >>> > >>> > >>> > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero < >>> fran...@apache.org> >>> > wrote: >>> > >>> > > Jaydeep, do you have any metrics on your clusters comparing them >>> before >>> > > and after introducing repair scheduling into the Cassandra process? >>> > > >>> > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote: >>> > > > Sounds good. Just wanted to bring it up. I agree that the >>> scheduling bit >>> > > is >>> > > > pretty light weight and the ideal would be to bring the whole of >>> the >>> > > repair >>> > > > external, which is a much bigger can of worms to open. >>> > > > >>> > > > >>> > > > >>> > > > -Jeremiah >>> > > > >>> > > > >>> > > > >>> > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink < >>> clohfin...@gmail.com> >>> > > wrote: >>> > > > > >>> > > > > >>> > > > >>> > > > > >>> > > > > >>> > > > > > I actually think we should be looking at how we can move >>> things out >>> > > of the >>> > > > > database process. >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > While worth pursuing, I think we would need a different CEP just >>> to >>> > > figure >>> > > > > out how to do that. Not only is there a lot of infrastructure >>> > > difficulty in >>> > > > > running multi process, the inter app communication needs to be >>> figured >>> > > out >>> > > > > better then JMX. Even the sidecar we dont have a solid story on >>> how to >>> > > > > ensure both are running or anything yet. It's up to each app >>> owner to >>> > > figure >>> > > > > it out. Once we have a good thing in place I think we can start >>> moving >>> > > > > compactions, repairs, etc out of the database. Even then it's the >>> > > _repairs_ >>> > > > > that is expensive, not the scheduling. >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan >>> > > > > <[jeremiah.jor...@gmail.com](mailto:jeremiah.jor...@gmail.com)> >>> > > wrote: >>> > > > > >>> > > > > >>> > > > >>> > > > >> I love the idea of a repair service being there by default for >>> an >>> > > install >>> > > > of C*. My main concern here is that it is putting more services >>> into >>> > > the main >>> > > > database process. I actually think we should be looking at how we >>> can >>> > > move >>> > > > things out of the database process. The C* process being a giant >>> > > monolith has >>> > > > always been a pain point. Is there anyway it makes sense for this >>> to be >>> > > an >>> > > > external process rather than a new thread pool inside the C* >>> process? >>> > > > >>> > > > >> >>> > > > >>> > > > >> >>> > > > > >>> > > > >> >>> > > > >>> > > > >> -Jeremiah Jordan >>> > > > >>> > > > >> >>> > > > >>> > > > >> >>> > > > > >>> > > > >> >>> > > > >>> > > > >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever >>> > > > <[m...@apache.org](mailto:m...@apache.org)> wrote: >>> > > > > >>> > > > >> >>> > > > >>> > > > >>> >>> > > > > >>> > > > >>> >>> > > > >>> > > > >>> This is looking strong, thanks Jaydeep. >>> > > > >>> > > > >>> >>> > > > >>> > > > >>> >>> > > > > >>> > > > >>> >>> > > > >>> > > > >>> I would suggest folk take a look at the design doc and the PR >>> in the >>> > > CEP. >>> > > > A lot is there (that I have completely missed). >>> > > > >>> > > > >>> >>> > > > >>> > > > >>> >>> > > > > >>> > > > >>> >>> > > > >>> > > > >>> I would especially ask all authors of prior art (Reaper, DSE >>> > > nodesync, >>> > > > ecchronos) to take a final review of the proposal >>> > > > > >>> > > > >>> >>> > > > >>> > > > >>> >>> > > > > >>> > > > >>> >>> > > > >>> > > > >>> Jaydeep, can we ask for a two week window while we reach out >>> to these >>> > > > people ? There's a lot of prior art in this space, and it feels >>> like >>> > > we're in >>> > > > a good place now where it's clear this has legs and we can use >>> that to >>> > > bring >>> > > > folk in and make sure there's no remaining blindspots. >>> > > > >>> > > > >>> >>> > > > >>> > > > >>> >>> > > > > >>> > > > >>> >>> > > > >>> > > > >>> >>> > > > > >>> > > > >>> >>> > > > >>> > > > >>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia >>> > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> >>> > > wrote: >>> > > > > >>> > > > >>> >>> > > > >>> > > > >>>> Sorry, there is a typo in the CEP-37 link; here is the correct >>> > > > [link]( >>> > > >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution >>> > > ) >>> > > > >>> > > > >>>> >>> > > > >>> > > > >>>> >>> > > > > >>> > > > >>>> >>> > > > >>> > > > >>>> >>> > > > > >>> > > > >>>> >>> > > > >>> > > > >>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia >>> > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> >>> > > wrote: >>> > > > > >>> > > > >>>> >>> > > > >>> > > > >>>>> First, thank you for your patience while we strengthened the >>> > > CEP-37. >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> >>> > > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh >>> > > McKenzie, >>> > > > Dinesh Joshi, Kristijonas Zalys, and I have done tons of work >>> (online >>> > > > discussions/a dedicated Slack channel >>> > > #cassandra-repair-scheduling-cep37) to >>> > > > come up with the best possible design that not only significantly >>> > > simplifies >>> > > > repair operations but also includes the most common features that >>> > > everyone >>> > > > will benefit from running at Scale. >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> For example, >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> * Apache Cassandra must be capable of running multiple >>> repair >>> > > types, >>> > > > such as Full, Incremental, Paxos, and Preview - so the framework >>> should >>> > > be >>> > > > easily extendable with no additional overhead from the operator’s >>> point >>> > > of >>> > > > view. >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> * An easy way to extend the token-split calculation >>> algorithm >>> > > with a >>> > > > default implementation should exist. >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> * Running incremental repair reliably at Scale is pretty >>> > > challenging, >>> > > > so we need to place safeguards, such as migration/rollback w/o >>> restart >>> > > and >>> > > > stopping incremental repair automatically if the disk is about to >>> get >>> > > full. >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> We are glad to inform you that CEP-37 (i.e., Repair inside >>> > > Cassandra) is >>> > > > now officially ready for review after multiple rounds of design, >>> > > testing, code >>> > > > reviews, documentation reviews, and, more importantly, validation >>> that >>> > > it runs >>> > > > at Scale! >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> >>> > > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> Some facts about CEP-37. >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> * Multiple members have verified all aspects of CEP-37 >>> numerous >>> > > times. >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> * The design proposed in CEP-37 has been thoroughly tried >>> and >>> > > tested >>> > > > on an immense scale (hundreds of unique Cassandra clusters, tens of >>> > > thousands >>> > > > of Cassandra nodes, with tens of millions of QPS) on top of 4.1 >>> > > open-source >>> > > > for more than five years; please see more details[ >>> > > > here]( >>> > > >>> https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations- >>> > > > at-scale/). >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> * The following >>> > > > [presentation]( >>> > > >>> https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13 >>> > > ) >>> > > > highlights the rigorous applied to CEP-37, which was given during >>> last >>> > > week’s >>> > > > Apache Cassandra Bay Area [Meetup]( >>> > > https://www.meetup.com/apache-cassandra- >>> > > > bay-area/events/303469006/), >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>> > > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> Since things are massively overhauled, we believe it is >>> almost >>> > > ready for >>> > > > a final pass pre-VOTE. We would like you to please review the >>> > > > [CEP-37]( >>> > > >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution\ >>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution%5C> >>> > > < >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution%5C >>> > >>> > > )) >>> > > > and the associated detailed design >>> > > > [doc](https://docs.google.com/document/d/1CJWxjEi- >>> > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0). >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> >>> > > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> Thank you everyone! >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep >>> > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> >>> > > > > >>> > > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> >>> > > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie >>> > > > <[jmcken...@apache.org](mailto:jmcken...@apache.org)> wrote: >>> > > > > >>> > > > >>>>> >>> > > > >>> > > > >>>>>> __ >>> > > > >>> > > > >>>>>> >>> > > > >>> > > > >>>>>> Not quite; finishing touches on the CEP and design doc are >>> in >>> > > flight >>> > > > (as of last / this week). >>> > > > > >>> > > > >>>>>> >>> > > > >>> > > > >>>>>> >>> > > > > >>> > > > >>>>>> >>> > > > >>> > > > >>>>>> Soon(tm). >>> > > > >>> > > > >>>>>> >>> > > > >>> > > > >>>>>> >>> > > > > >>> > > > >>>>>> >>> > > > >>> > > > >>>>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote: >>> > > > > >>> > > > >>>>>> >>> > > > >>> > > > >>>>>>> Is this CEP ready for a VOTE thread? >>> > > > < >>> > > >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution >>> > >>> > > >>> > > > > >>> > > > >>>>>>> >>> > > > >>> > > > >>>>>>> >>> > > > > >>> > > > >>>>>>> >>> > > > >>> > > > >>>>>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia >>> > > > <[chovatia.jayd...@gmail.com](mailto:chovatia.jayd...@gmail.com)> >>> > > wrote: >>> > > > > >>> > > > >>>>>>> >>> > > > >>> > > > >>>>>>>> Thanks, Josh. I've just updated the >>> > > > [CEP]( >>> > > >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution >>> > > ) >>> > > > and included all the solutions you mentioned below. >>> > > > > >>> > > > >>>>>>>> >>> > > > >>> > > > >>>>>>>> >>> > > > > >>> > > > >>>>>>>> >>> > > > >>> > > > >>>>>>>> Jaydeep >>> > > > > >>> > > > >>>>>>>> >>> > > > >>> > > > >>>>>>>> >>> > > > > >>> > > > >>>>>>>> >>> > > > >>> > > > >>>>>>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie >>> > > > <[jmcken...@apache.org](mailto:jmcken...@apache.org)> wrote: >>> > > > > >>> > > > >>>>>>>> >>> > > > >>> > > > >>>>>>>>> __ >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> Very late response from me here (basically necro'ing this >>> > > thread). >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> I think it'd be useful to get this condensed into a CEP >>> that >>> > > we can >>> > > > then discuss in that format. It's clearly something we all agree >>> we need >>> > > and >>> > > > having an implementation that works, even if it's not in your >>> preferred >>> > > > execution domain, is vastly better than nothing IMO. >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> I don't have cycles (nor background ;) ) to do that, but >>> it >>> > > sounds >>> > > > like you do Jaydeep given the implementation you have on a private >>> fork + >>> > > > design. >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> A non-exhaustive list of things that might be useful >>> > > incorporating >>> > > > into or referencing from a CEP: >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> Slack thread: <https://the- >>> > > > asf.slack.com/archives/CK23JSY2K/p1690225062383619> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> Joey's old C* ticket: >>> > > > <https://issues.apache.org/jira/browse/CASSANDRA-14346> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> Even older automatic repair scheduling: >>> > > > <https://issues.apache.org/jira/browse/CASSANDRA-10070> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> Your design gdoc: < >>> > > https://docs.google.com/document/d/1CJWxjEi- >>> > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> PR with automated repair: >>> > > > < >>> > > >>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c >>> > >>> > > >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> My intuition is that we're all basically in agreement >>> that >>> > > this is >>> > > > something the DB needs, we're all willing to bikeshed for our >>> personal >>> > > > preference on where it lives and how it's implemented, and at the >>> end of >>> > > the >>> > > > day, code talks. I don't think anyone's said they'll die on the >>> hill of >>> > > > implementation details, so that feels like CEP time to me. >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> If you were willing and able to get a CEP together for >>> > > automated >>> > > > repair based on the above material, given you've done the work and >>> have >>> > > the >>> > > > proof points it's working at scale, I think this would be a _huge >>> > > > contribution_ to the community. >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote: >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>>> Is anyone going to file an official CEP for this? >>> > > > > >>> > > > >>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>> As mentioned in this email thread, here is one of the >>> > > solution's >>> > > > [design doc](https://docs.google.com/document/d/1CJWxjEi- >>> > > > mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0) >>> and >>> > > source >>> > > > code on a private Apache Cassandra patch. Could you go through it >>> and >>> > > let me >>> > > > know what you think? >>> > > > > >>> > > > >>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>> >>> > > > > >>> > > > >>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>> Jaydeep >>> > > > > >>> > > > >>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>> >>> > > > > >>> > > > >>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad >>> > > > <[rustyrazorbl...@apache.org](mailto:rustyrazorbl...@apache.org)> >>> > > wrote: >>> > > > > >>> > > > >>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > That said I would happily support an effort to bring >>> repair >>> > > > scheduling to the sidecar immediately. This has nothing blocking >>> it, and >>> > > would >>> > > > potentially enable the sidecar to provide an official repair >>> scheduling >>> > > > solution that is compatible with current or even previous versions >>> of the >>> > > > database. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> This is something I hadn't thought much about, and is a >>> > > pretty >>> > > > good argument for using the sidecar initially. There's a lot of >>> > > deployments >>> > > > out there and having an official repair option would be a big win. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote: >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > I agree that it would be ideal for Cassandra to have >>> a >>> > > repair >>> > > > scheduler in-DB. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > That said I would happily support an effort to bring >>> repair >>> > > > scheduling to the sidecar immediately. This has nothing blocking >>> it, and >>> > > would >>> > > > potentially enable the sidecar to provide an official repair >>> scheduling >>> > > > solution that is compatible with current or even previous versions >>> of the >>> > > > database. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > Once TCM has landed, we’ll have much stronger >>> primitives >>> > > for >>> > > > repair orchestration in the database itself. But I don’t think that >>> > > should >>> > > > block progress on a repair scheduling solution in the sidecar, and >>> there >>> > > is >>> > > > nothing that would prevent someone from continuing to use a >>> sidecar-based >>> > > > solution in perpetuity if they preferred. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > \- Scott >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad >>> > > > <[rustyrazorbl...@apache.org](mailto:rustyrazorbl...@apache.org)> >>> > > wrote: >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > I'm 100% in favor of repair being part of the >>> core DB, >>> > > not >>> > > > the sidecar. The current (and past) state of things where running >>> the DB >>> > > > correctly *requires* running a separate process (either community >>> > > maintained >>> > > > or official C* sidecar) is incredibly painful for folks. The idea >>> that >>> > > your >>> > > > data integrity needs to be opt-in has never made sense to me from >>> the >>> > > > perspective of either the product or the end user. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > I've worked with way too many teams that have >>> either >>> > > > configured this incorrectly or not at all. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > Ideally Cassandra would ship with repair built in >>> and on >>> > > by >>> > > > default. Power users can disable if they want to continue to >>> maintain >>> > > their >>> > > > own repair tooling for some reason. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > Jon >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > > >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >> On 2023/07/24 20:44:14 German Eichberger via dev >>> > > wrote: >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >> All, >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >> We had a brief discussion in [2] about the Uber >>> article >>> > > [1] >>> > > > where they talk about having integrated repair into Cassandra and >>> how >>> > > great >>> > > > that is. I expressed my disappointment that they didn't work with >>> the >>> > > > community on that (Uber, if you are listening time to make amends >>> 🙂) >>> > > and it >>> > > > turns out Joey already had the idea and wrote the code [3] - so I >>> wanted >>> > > to >>> > > > start a discussion to gauge interest and maybe how to revive that >>> > > effort. >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >> Thanks, >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >> German >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >> [1] < >>> > > https://www.uber.com/blog/how-uber-optimized-cassandra- >>> > > > operations-at-scale/> >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >> [2] <https://the- >>> > > > asf.slack.com/archives/CK23JSY2K/p1690225062383619> >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >> [3] < >>> > > https://issues.apache.org/jira/browse/CASSANDRA-14346> >>> > > > > >>> > > > >>>>>>>>>>> >>> > > > >>> > > > >>>>>>>>>>> > >>> > > > > >>> > > > >>>>>>>>> >>> > > > >>> > > > >>>>>>>>> >>> > > > > >>> > > > >>>>>> >>> > > > >>> > > > >>>>>> >>> > > > > >>> > > > >>> > > > >>> > > >>> > >>> >>