> The only sub-proposal I’m particularly unsure about is 17059, which doesn’t > seem to increase modularity at all. It looks to be a kind of plugin hook, and > IMO should definitely be addressed separately. Perhaps a simple DISCUSS > thread and its Jira will suffice?
Ok. I will remove that one from the CEP to discuss separately. > On Oct 26, 2021, at 2:32 PM, bened...@apache.org wrote: > >> I'm not particularly sympathetic to the concerns about friction on making >> changes to internal API's since modern IDE tooling makes this a trivial >> exercise > > We’re getting abstract here, so this isn’t a rebuttal or even tied to > strongly this particularly discussion, but to express my point more clearly. > > We don’t abstract everything in the codebase, and in fact in general we (or > at least, I) try to keep things concrete as long as there’s no reason to > abstract them, because this is usually easier to reason about and lower > overhead to modify. This is true even on the single class level, so of course > it happens at the module level. This isn’t about the IDE refactoring, but the > cognitive burden of reasoning simultaneously about the concrete class and the > abstraction, and how they relate. > > The problem with premature abstraction, and particularly when multiple > implementations start appearing, is that you have to start formalising the > abstractions in ways that permit you to reason only about the abstraction. > This necessarily means eschewing some knowledge of how the concrete > implementation(s) work. This may prevent very useful simplifications for how > you interact with a specific concrete implementation, as we have to code to > the API. This may prevent optimisations. This may also introduce additional > complexity when either implementing the abstraction or when reasoning about > the actions you are performing against it, where often you may not entirely > ignore the concrete implementation (due to imperfect or ambiguous API > specifications), so you must now consider if you are compatible with both the > abstraction and any known concrete implementations. > > These are all additional burdens, but we often pay the cost for perceived > benefits. > > It seems to me though that this discussion is conflating > modularisation/pluggability with decoupling, which is a benefit we might gain > in return for these additional costs. To me this is a distinct problem, > however. It’s quite possible to modularise and yet tightly couple, though > usually it will break tight coupling. But breaking tight coupling doesn’t > require modularisation, and certainly doesn’t require pluggability. > > To bring it back to this discussion, the intent of a piece of work always > drives the outcome, and in my opinion it is best to always consider a work in > its actual context. The primary purpose of this work is pluggability, and so > this will inform the API modifications. A straightforward goal of reducing > tight coupling in the codebase would likely approach this problem > differently. None of this is a bad thing, just in my opinion the nature of > development. > > That said, I’m broadly happy to see this work go ahead. I would prefer to > split the conversations out into their driving projects for the > aforementioned reasons, but I wouldn’t veto the proposal on that basis. It > would be nice to see others’ opinions about this. > > The only sub-proposal I’m particularly unsure about is 17059, which doesn’t > seem to increase modularity at all. It looks to be a kind of plugin hook, and > IMO should definitely be addressed separately. Perhaps a simple DISCUSS > thread and its Jira will suffice? > > > From: Joshua McKenzie <jmcken...@apache.org> > Date: Tuesday, 26 October 2021 at 19:16 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP-18: Improving Modularity >> >> To me having some defined interfaces for interacting with different >> sections of the code is a huge boon for improving developer productivity >> going forward in the project. Every place where we can reduce the amount >> of code reaching inside another module to get at a random internal class is >> a positive, > > I've long been of the opinion that the benefits outweigh the costs of > having clear interface points between major subsystems in a codebase. I'm > not particularly sympathetic to the concerns about friction on making > changes to internal API's since modern IDE tooling makes this a trivial > exercise, however I _am_ quite sympathetic to the concerns about > introducing friction against deeper integrations between subsystems. > > That said, we have a history on the project of being somewhat hot and cold > when it comes to our approach to performance testing; I think our low > hanging fruit as a project revolves more around discipline and > reproducibility on knowing where our performance is today and making > changes with an eye to that rather than keeping open the flexibility of > tightly coupling subsystems through their implementations. > > With the modern runtime environment shifting so much toward > containerization I can't help but think smaller, clearly modularized > components are more resilient against a rapidly evolving runtime > environment and more sympathetic to the constrained resource environments > they run in, as well as more classically optimizable in their own right. > > I air all this just to contribute perspective to the discussion; all that > said, I think refactoring APIs as a pure reflection of what the DB is doing > today just risks ossifying something that grew up organically and probably > isn't going to do us any favors, so having a use-case (or better yet a few > implementations) we're deriving an interface from, or targeting a more > testable / mockable structure plus introducing those tests should give us > guidance to improve the route we go. > > ~Josh > > > On Mon, Oct 25, 2021 at 4:22 PM Jeremiah D Jordan <jerem...@datastax.com> > wrote: > >> As Henrik said we have been refactoring access to these different internal >> APIs as part of some larger work. For this CEP we pulled together a bunch >> of the smaller ones into one place, similar to the refactoring proposed in >> CEP-10, as we felt doing many small CEPs, one per module, would be less >> productive if there was support in the project in general for trying to >> standardize access to different sections of the code and start creating a >> more defined internal API. If there is consensus that it would be better >> to propose each change as its own CEP, or even just as single tickets >> without a CEP for these internal refactors, we can do that as well. The >> CEP process is evolving as we go through these, so just trying to figure >> out the best way forward. >> >> The currently proposed changes in CEP-18 should all include improved test >> coverage of the modules in question. We have been developing them all with >> a requirement that all changes have at least %80 code coverage from sonar >> cloud jacoco reports. We have also found and fixed some bugs in the >> existing code during this development work. >> >> To me having some defined interfaces for interacting with different >> sections of the code is a huge boon for improving developer productivity >> going forward in the project. Every place where we can reduce the amount >> of code reaching inside another module to get at a random internal class is >> a positive, as it prevents unknown side effects when changing that module >> when the person developing the new feature did not realize other parts of >> the code were depending on some current internal behavior that was not >> clearing part of the modules interface. >> >> On the question of changing internal interfaces that I have seen in some >> other venues, I do not think creating such interfaces should prevent us >> from changing them as needed for future work. I think having the >> interfaces actually improves on our ability to do so without breaking other >> parts of the code. My suggestion would be that we try not to make such >> changes in patch releases if possible, but again I wouldn’t let that hold >> anything back. >> >> So do people feel we should re-propose these as multiple CEP’s or just >> tickets? Or do people prefer to have a discussion/vote on the idea of >> improving the modularity of the code base in general? >> >> -Jeremiah >> >>> On Oct 25, 2021, at 9:26 AM, bened...@apache.org wrote: >>> >>> Thanks Henrik for the additional context. >>> >>> I’m not personally a fan of modularity only for modularity’s sake. >> Everything in software is a balancing act of competing priorities, and >> while pluggability supports certain use cases it can slow down development >> or prevent deeper integrations by preventing assumptions about how systems >> operate. >>> >>> To be clear, I’m fully in favour of helping to enable your use cases, I >> just think it is important to make a decision for each refactor based on >> the merits and goals in question. If the justification is improved testing, >> then testing should be a core goal of the CEP. If it’s enabling a feature >> to be upstreamed later, I personally would prefer to tie the refactors to >> those features – which I hope will all find broad support for inclusion; >> certainly those I have heard of, I am eager to see arrive in Cassandra. >>> >>> If the goal is to support entirely external features, we have to decide >> what kind of support we offer to these APIs, and this probably needs to be >> discussed on a per-API basis with the justification for pluggability >> weighed against any constraints this imposes on development. The most >> obvious example here is membership and schema, which I think is a primarily >> to support an external dependency but we expect this area of the codebase >> to be significantly revised over the coming months. >>> >>> >>> From: Henrik Ingo <henrik.i...@datastax.com <mailto: >> henrik.i...@datastax.com>> >>> Date: Monday, 25 October 2021 at 14:52 >>> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> < >> dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> >>> Subject: Re: [DISCUSS] CEP-18: Improving Modularity >>> Hi Benedict >>> >>> This CEP is a bundle of APIs arising out of our recent work to >> re-architect >>> Cassandra into a more cloud native architecture. What our product >> marketing >>> has chosen to call "Serverless" is a variant of Cassandra where we have >>> separated compute from storage (coordinator vs data node), used S3-like >>> storage, and made various improvements to better support multi-tenancy >> in a >>> single Cassandra (Serverless) cluster. This whitepaper [1] explains this >>> work in detail for those of you interested to learn more. (Apologies that >>> it requires registration and the first page may at times sound a bit >>> marketingy, but it's really the most detailed report we have published so >>> far.) >>> >>> [1] https://www.datastax.com/resources/whitepaper/astra-serverless >>> >>> The above work was implemented in a way where by default a user can >>> continue to run Cassandra in the familiar "classic" way. The APIs >>> introduced by CEP-18 on the other hand allow alternate or additional >>> functionality to be provided, which in our case we have used to create a >>> "serverless" way of deploying a Cassandra cluster. >>> >>> The logic behind proposing this bundle of APIs separately, is roughly for >>> these reasons: >>> >>> The APIs touch existing code and functionality, so to minimize risk to >> the >>> next Cassandra release, it would make sense to try to complete merging >> this >>> work as early as possible in the development cycle. For the same reason, >>> keeping the new implementations out of this CEP allows us to focus >> review - >>> both of the CEP, and the eventual pull requests - on the APIs themselves, >>> whereas the related implementations (or plug-ins) would add to the scope >>> quite significantly. On the other hand non-default plugin functionality >> can >>> be added later with much lower risk. >>> >>> Second, while it's completely fair to ask for context, why was this >>> particular refactoring or API done in the first place, the assumption >> for a >>> CEP like this one is that better defined interfaces, that are better >>> documented and come with better test coverage than existing code, should >> be >>> enough legs to stand on in itself. Also, in the best case a good API will >>> also enable other implementations than the one we had in mind when >>> developing the API, so we wouldn't want to tie the discussion too much >> into >>> the implementation that happened to be the first. (As an example of this >>> working out nicely, your own work in CASSANDRA-16926 was for you >> motivated >>> by enabling a new kind of testing, but it also just so happens it is the >>> same work that enables someone to implement remote file storage, which we >>> therefore could drop from this CEP-18.) >>> >>> Conversely also, it was our expectation when proposing this CEP that >>> "better modularity" at least on a high level should be a fairly >>> straightforward conversation, while the actual plugins that make up our >>> "serverless" new architecture may reasonably ignite much more debate, or >> at >>> least questions as to how they work. As we have a backlog of several >> fairly >>> substantial CEPs lined up, we are trying to be very mindful of the >>> bandwidth of the developers on this list. For example, last week Jacek >> also >>> proposed CEP-17 for discussion. So we are trying to focus the discussion >> on >>> what's in CEP-17 and CEP-18 for now. (In addition I remember at least 2 >>> CEPs that were discussed but not yet voted on. I don't know if this adds >> to >>> cognitive load for anyone else than myself.) >>> >>> henrik >>> >>> On Mon, Oct 25, 2021 at 12:39 PM bened...@apache.org < >> bened...@apache.org> >>> wrote: >>> >>>> Hi Jeremiah, >>>> >>>> My personal view is that work to modularise the codebase should be tied >> to >>>> specific use cases. If improved testing is the purpose of this work, I >>>> think it would help to include those improved tests that you plan to >>>> support as goals for the CEP. >>>> >>>> If on the other hand some of this work is primarily intended to enable >>>> certain features, I personally think it would be preferable to tie them >> to >>>> those features - perhaps with their own CEP? >>>> >>>> >>>> From: Jeremiah Jordan <jeremiah.jor...@gmail.com> >>>> Date: Friday, 22 October 2021 at 16:24 >>>> To: Cassandra DEV <dev@cassandra.apache.org> >>>> Subject: [DISCUSS] CEP-18: Improving Modularity >>>> Hi All, >>>> As has been seen with the work already started in CEP-10, increasing the >>>> modularity of our subsystems can improve their testability, and also the >>>> ability to try new implementations without breaking things. >>>> >>>> Our team has been working on doing this and CEP-18 has been created to >>>> propose adding more modularity to a few different subsystems. >>>> >>>> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-18%3A+Improving+Modularity >>>> >>>> CASSANDRA-17044 has already been created for Schema Storage changes >> related >>>> to this work and more JIRAs and PRs are to follow for the other >> subsystems >>>> proposed in the CEP. >>>> >>>> Thanks, >>>> -Jeremiah Jordan >>>> >>> >>> >>> -- >>> >>> Henrik Ingo >>> >>> +358 40 569 7354 <358405697354> >>> >>> [image: Visit us online.] <https://www.datastax.com/> [image: Visit us >> on >>> Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on >> YouTube.] >>> < >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e= >>> >>> [image: Visit my LinkedIn profile.] < >> https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!MiGmtcfVF1M2qLDlD18xw2bDHMJqp1cPfnoa-7WDdoWmM26YYo2vM-znIXghiwXv$<https://urldefense.com/v3/__https:/www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!MiGmtcfVF1M2qLDlD18xw2bDHMJqp1cPfnoa-7WDdoWmM26YYo2vM-znIXghiwXv$> >> < >> https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!MiGmtcfVF1M2qLDlD18xw2bDHMJqp1cPfnoa-7WDdoWmM26YYo2vM-znIXghiwXv$><https://urldefense.com/v3/__https:/www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!MiGmtcfVF1M2qLDlD18xw2bDHMJqp1cPfnoa-7WDdoWmM26YYo2vM-znIXghiwXv$%3e> >>> >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org