Thanks, Josh. I've just updated the CEP <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution> and included all the solutions you mentioned below.
Jaydeep On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie <jmcken...@apache.org> wrote: > Very late response from me here (basically necro'ing this thread). > > I think it'd be useful to get this condensed into a CEP that we can then > discuss in that format. It's clearly something we all agree we need and > having an implementation that works, even if it's not in your preferred > execution domain, is vastly better than nothing IMO. > > I don't have cycles (nor background ;) ) to do that, but it sounds like > you do Jaydeep given the implementation you have on a private fork + design. > > A non-exhaustive list of things that might be useful incorporating into or > referencing from a CEP: > Slack thread: > https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619 > Joey's old C* ticket: > https://issues.apache.org/jira/browse/CASSANDRA-14346 > Even older automatic repair scheduling: > https://issues.apache.org/jira/browse/CASSANDRA-10070 > Your design gdoc: > https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0 > PR with automated repair: > https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c > > My intuition is that we're all basically in agreement that this is > something the DB needs, we're all willing to bikeshed for our personal > preference on where it lives and how it's implemented, and at the end of > the day, code talks. I don't think anyone's said they'll die on the hill of > implementation details, so that feels like CEP time to me. > > If you were willing and able to get a CEP together for automated repair > based on the above material, given you've done the work and have the proof > points it's working at scale, I think this would be a *huge contribution* > to the community. > > On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote: > > Is anyone going to file an official CEP for this? > As mentioned in this email thread, here is one of the solution's design > doc > <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0> > and source code on a private Apache Cassandra patch. Could you go through > it and let me know what you think? > > Jaydeep > > On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad <rustyrazorbl...@apache.org> > wrote: > > > That said I would happily support an effort to bring repair scheduling > to the sidecar immediately. This has nothing blocking it, and would > potentially enable the sidecar to provide an official repair scheduling > solution that is compatible with current or even previous versions of the > database. > > This is something I hadn't thought much about, and is a pretty good > argument for using the sidecar initially. There's a lot of deployments out > there and having an official repair option would be a big win. > > > On 2023/07/26 23:20:07 "C. Scott Andreas" wrote: > > I agree that it would be ideal for Cassandra to have a repair scheduler > in-DB. > > > > That said I would happily support an effort to bring repair scheduling > to the sidecar immediately. This has nothing blocking it, and would > potentially enable the sidecar to provide an official repair scheduling > solution that is compatible with current or even previous versions of the > database. > > > > Once TCM has landed, we’ll have much stronger primitives for repair > orchestration in the database itself. But I don’t think that should block > progress on a repair scheduling solution in the sidecar, and there is > nothing that would prevent someone from continuing to use a sidecar-based > solution in perpetuity if they preferred. > > > > - Scott > > > > > On Jul 26, 2023, at 3:25 PM, Jon Haddad <rustyrazorbl...@apache.org> > wrote: > > > > > > I'm 100% in favor of repair being part of the core DB, not the > sidecar. The current (and past) state of things where running the DB > correctly *requires* running a separate process (either community > maintained or official C* sidecar) is incredibly painful for folks. The > idea that your data integrity needs to be opt-in has never made sense to me > from the perspective of either the product or the end user. > > > > > > I've worked with way too many teams that have either configured this > incorrectly or not at all. > > > > > > Ideally Cassandra would ship with repair built in and on by default. > Power users can disable if they want to continue to maintain their own > repair tooling for some reason. > > > > > > Jon > > > > > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote: > > >> All, > > >> We had a brief discussion in [2] about the Uber article [1] where > they talk about having integrated repair into Cassandra and how great that > is. I expressed my disappointment that they didn't work with the community > on that (Uber, if you are listening time to make amends 🙂) and it turns > out Joey already had the idea and wrote the code [3] - so I wanted to start > a discussion to gauge interest and maybe how to revive that effort. > > >> Thanks, > > >> German > > >> [1] > https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/ > > >> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619 > > >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346 > > > > >