Just because it is a feature for users who are developers does not mean it is not a new feature? Adding this capability is adding new functionality to what developers can do with Apache Cassandra. How is that not a new feature?
Semver has been brought up a lot in conversations around what can go where. If we look at how semver defines such things: MAJOR version when you make incompatible API changes, MINOR version when you add functionality in a backwards compatible manner, and PATCH version when you make backwards compatible bug fixes. This change to me sounds like 2. Adding new functionality in a backwards compatible manner. I guess our issue here is that we have never actually done MINOR releases in the C* project, we only make MAJOR releases and PATCH releases. So we need to decide where things that in semver would go in a MINOR version should go. In my mind it was always that such things should only go to a MAJOR, as it seems less safe to relax what goes in a PATCH and allow them there. -Jeremiah > On Jul 13, 2021, at 8:47 AM, bened...@apache.org wrote: > >> I do think adding the ability to do “Cluster and Code Simulations” is a new >> feature. > > I don’t. I understand a feature to be a user-visible change, such as new > functionality, and it was on this basis I endorsed the release lifecycle > document. I do not believe that all improvement should stop to patch > releases, as I do not believe this produces the highest quality outcome. > > > > > From: Jeremiah D Jordan <jerem...@datastax.com> > Date: Tuesday, 13 July 2021 at 14:41 > To: Cassandra DEV <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > I do not think fixing CASSANDRA-12126 is not a new feature. I do think > adding the ability to do “Cluster and Code Simulations” is a new feature. > > -Jeremiah > >> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote: >> >> Nothing we’re discussing constitutes a feature. We’re discussing stability >> enhancements, and important bug fixes. >> >> I think this disagreement is to some extent founded on our different >> premises about what a patch release should contain, and this seems to be the >> fault of incompletely specified documentation. >> >> 1. The release lifecycle only forbids feature work from being developed in a >> patch release, and only expressly includes bug fixes. Note that, the >> document even has a comment by the author suggesting that features may be >> backported to a patch release from trunk (not something I agree with, but it >> demonstrates the ambiguity of the definition). >> 2. There seems to be some conflation of size-of-change with the >> admissibility wrt release lifecycle – I don’t think there’s any criteria >> here, and it’s open to the community’s case-by-case assessment. Whatever we >> do to fix the bug in question will necessarily be a very significant piece >> of work itself, for instance. >> >> My interpretation of the release lifecycle document is that it is acceptable >> to include this work in a patch release. My belief about its impact is that >> it would contribute positively to the stability of the project’s 4.0 >> releases over the lifecycle, and improve project velocity. >> >> With respect to whether we can ship a fix to 12126 without validation, I >> would be strongly opposed to this, and certainly would not produce a patch >> myself in this way. Not only would it be burdensome (given the divergences >> in the codebase), but I would not consider it acceptably safe (given the >> divergence). >> >> >> From: Jeremiah D Jordan <jeremiah.jor...@gmail.com> >> Date: Tuesday, 13 July 2021 at 14:15 >> To: Cassandra DEV <dev@cassandra.apache.org> >> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >> I tend to agree with Paulo that a major refactoring of some internal >> interfaces sounds like something to be explicitly avoided in a patch >> release. I thought this was the type of change we all agreed we should stop >> letting in to patch releases, and that we would attempt to release more >> often (once a year) so changes that only go to trunk would get out faster? >> Are we really wanting to break that promise to ourselves before we even >> release 4.0? To me “I think we need this feature released faster” is not a >> reason to put it in 4.0, it could be a reason to release 4.1 sooner. This >> is where having a releasable trunk helps, as if we decided as a project that >> some change was worth a new major being released early the effort of doing >> that release is much smaller when trunk is releasable. >> >> Any fix we make in 4.0 would be merged forward into trunk and could be fully >> verified there? Probably not the best, but would give more confidence in a >> fix than otherwise without adding other major changes to 4.0? >> >> -Jeremiah >> >>> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer <b.le...@gmail.com> wrote: >>> >>>> >>>> Furthermore, we introduced a significant performance regression in all >>>> lines of the software by increasing the number of LWT round-trips. Unless >>>> we intend to leave this regression for a further year without _any_ release >>>> offering a solution, we will need suitable verification mechanisms for >>>> whatever fixes we deliver. >>>> >>>> My view is that it is unacceptable to leave such a significant regression >>>> unaddressed in all lines of software we intend to release for the >>>> foreseeable future. >>> >>> >>> I would like to expand a bit on this as I believe it might be important for >>> people to have the full picture. The fix for CASSANDRA-12126 >>> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a >>> regression by increasing the number of LWT round-trips. Nevertheless, the >>> patch introduced a flag to allow users to revert to the previous behavior >>> (previous performance + consistency issue). >>> >>> Also the patch did not address all paxos consistency issues. There are >>> still some issues during topologie changes (may be in some other scenarios). >>> >>> My understanding of Benedict's proposal is to fix paxos once and for all >>> without any performance regression. >>> >>> That goal makes total sense to me. "Where do we do that?" is a more tricky >>> question. >>> >>> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org <bened...@apache.org> a >>> écrit : >>> >>>> Hmm. It occurs to me I’m not entirely sure how our new release process is >>>> going to work. >>>> >>>> Will we be releasing 4.1 builds immediately, as part of shippable trunk? >>>> Or will 4.0 be our only active line of software for the next year? >>>> >>>> Either way, I bet my bottom dollar there will come some regret if we >>>> introduce such divergence between the two most active branches we maintain, >>>> so early in their lifecycles. If we invest significant resources in >>>> improved testing using this framework (which I very much expect) then >>>> branches that are not compatible will not benefit, likely reducing their >>>> quality; and the risk of backports will increase, due to divergence. >>>> >>>> Altogether, I think it would be a huge mistake. But if we will be shipping >>>> releases soon that can fix these aforementioned regressions, I won’t >>>> campaign for it. >>>> >>>> >>>> >>>> From: bened...@apache.org <bened...@apache.org> >>>> Date: Tuesday, 13 July 2021 at 13:31 >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>> No change is without risk; we have introduced serious regressions with bug >>>> fixes to patch releases. The overall risk to the release lifecycle is >>>> reduced significantly in my opinion, as we reduce the likelihood of >>>> introducing regressions, and can use the same test infrastructure across >>>> all of the actively developed releases, increasing our confidence in 4.0.x >>>> releases. >>>> >>>> Furthermore, we introduced a significant performance regression in all >>>> lines of the software by increasing the number of LWT round-trips. Unless >>>> we intend to leave this regression for a further year without _any_ release >>>> offering a solution, we will need suitable verification mechanisms for >>>> whatever fixes we deliver. >>>> >>>> My view is that it is unacceptable to leave such a significant regression >>>> unaddressed in all lines of software we intend to release for the >>>> foreseeable future. >>>> >>>> >>>> From: Paulo Motta <pauloricard...@gmail.com> >>>> Date: Tuesday, 13 July 2021 at 13:21 >>>> To: Cassandra DEV <dev@cassandra.apache.org> >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>>> No, in my opinion the target should be 4.0.x. We are reaching for a >>>> shippable trunk and this has no public API impacts. This work is IMO >>>> central to achieving a shippable trunk, either way. The only reason I do >>>> not target 3.x is that it would be too burdensome. >>>> >>>> In my limited view of the proposal, a major refactor of internal >>>> concurrency APIs to support the testing facility potentially risks the >>>> stability of a minor release, something we've been wanting to avoid with >>>> our focus on stability. So I'd prefer this to go in trunk/4.1, otherwise >>>> we will create precedence to including non-bugfix changes in minor >>>> versions, something I think we should avoid. >>>> >>>> In the past we've been lenient to including seemingly harmless internal >>>> changes that caused client impact and we should be careful to avoid this in >>>> the future. To prevent this I think we should take a strict approach and >>>> only accept bug fixes in minor (ie. 4.0.x) versions moving forward. >>>> >>>> I'd go one step further and propose that any CEPs, which are generally >>>> about new features, major API changes or internal refactorings, should only >>>> be allowed in subsequent major versions, unless an explicit exception is >>>> granted. >>>> >>>> Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org < >>>> bened...@apache.org> escreveu: >>>> >>>>> Perhaps it’s worth looking forward at the roadmap that we plan to >>>> develop, >>>>> and consider whether such a facility would be welcome for proving their >>>>> safety, and we can then worry about evolving the specifics of any API(s) >>>>> together as we deploy the capability? Looking ahead, there are very few >>>>> major features I wouldn’t want to see exercised with this approach, given >>>>> the choice. >>>>> >>>>> The LWT Verifier by itself is an integration test that covers many of the >>>>> affected subsystems, including sstables, memtables and repair. But we >>>> will >>>>> have the ability to introduce dedicated verification for each of these >>>>> features and systems, and we will necessarily produce more robust code >>>>> (repair is a great example of a brittle system that would be impossible >>>> to >>>>> produce with such an adversarial test system) >>>>> >>>>> >>>>> *Query side improvements:* >>>>> >>>>> * Storage Attached Index or SAI. The CEP can be found at >>>>> >>>>> >>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index >>>>> * Add support for OR predicates in the CQL where clause >>>>> * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs >>>>> in GROUP BY clause >>>>> * Ability to read the TTL and WRITE TIME of an element in a collection >>>>> (CASSANDRA-8877) >>>>> * Multi-Partition LWTs >>>>> * Materialized views hardening: Addressing the different Materialized >>>>> Views issues (see CASSANDRA-15921 and [1] for some of the work involved) >>>>> >>>>> *Security improvements:* >>>>> >>>>> * SSTables encryption (CASSANDRA-9633) >>>>> * Add support for Dynamic Data Masking (CEP pending) >>>>> * Allow the creation of roles that have the ability to assign arbitrary >>>>> privileges, or scoped privileges without also granting those roles access >>>>> to database objects. >>>>> * Filter rows from system and system_schema based on users permissions >>>>> (CASSANDRA-15871) >>>>> >>>>> *Performance improvements:* >>>>> >>>>> * Trie-based index format (CEP pending) >>>>> * Trie-based memtables (CEP pending) >>>>> * Paxos improvements: Paxos / LWT implementation that would enable the >>>>> database to serve serial writes with two round-trips and serial reads >>>> with >>>>> one round-trip in the uncontended case >>>>> >>>>> *Safety/Usability improvements:* >>>>> >>>>> * Guardrails. The CEP can be found at >>>>> >>>>> >>>> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails >>>>> * Add ability to track state in repair (CASSANDRA-15399) >>>>> * Repair coordinator improvements (CASSANDRA-15399) >>>>> * Make incremental backup configurable per keyspace and table >>>>> (CASSANDRA-15402) >>>>> * Add ability to blacklist a CQL partition so all requests are ignored >>>>> (CASSANDRA-12106) >>>>> * Add default and required keyspace replication options >>>> (CASSANDRA-14557) >>>>> * Transactional Cluster Metadata: Use of transactions to propagate >>>>> cluster metadata >>>>> * Downgrade-ability: Ability to downgrade to downgrade in the event >>>> that >>>>> a serious issue has been identified >>>>> >>>>> *Pluggability improvements:* >>>>> >>>>> * Pluggable schema manager (CEP pending) >>>>> * Pluggable filesystem (CEP pending) >>>>> * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can >>>> be >>>>> found at >>>>> >>>>> >>>> https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit >>>>> * Memtable API (CEP pending). The goal being to allow improvements such >>>>> as CASSANDRA-13981 to be easily plugged into Cassandra >>>>> >>>>> *Memtable pluggable implementation:* >>>>> >>>>> * Enable Cassandra for Persistent Memory (CASSANDRA-13981) >>>>> >>>>> >>>>> >>>>> >>>>> From: bened...@apache.org <bened...@apache.org> >>>>> Date: Tuesday, 13 July 2021 at 10:51 >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>>> Ach, editing code in the email editor isn’t smart when editors all have >>>>> different meanings for key combinations (accidentally hit send), but you >>>>> get the idea. The simulator would intercept these thread executions, the >>>>> memory accesses for the annotated field, and evaluate them so that in >>>> some >>>>> cases the assertions would fail. >>>>> >>>>> This is obviously a toy example that is not very interesting, but the >>>> main >>>>> real example we have is too complicated to produce a snippet to >>>>> demonstrate. In my view, the long term outcome of this work is likely the >>>>> enablement of many unit tests that are a little more complicated than >>>> this, >>>>> on less obvious code. >>>>> >>>>> But the headline goal of the CEP is not. By itself, the LWT Verifier >>>>> demonstrates the power and utility of the work. I don’t believe it is >>>>> terribly helpful to focus on secondary justifications like the example I >>>>> gave. For me, the _ability_ to prove the correctness of difficult but >>>>> critical systems is justification enough, whether or not we deliver a >>>>> simple API as part of the CEP. >>>>> >>>>> >>>>> >>>>> From: bened...@apache.org <bened...@apache.org> >>>>> Date: Tuesday, 13 July 2021 at 10:43 >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>>>> Should target release be 4.1. (not 4.0.x) ? >>>>> >>>>> >>>>> >>>>> No, in my opinion the target should be 4.0.x. We are reaching for a >>>>> shippable trunk and this has no public API impacts. This work is IMO >>>>> central to achieving a shippable trunk, either way. The only reason I do >>>>> not target 3.x is that it would be too burdensome. >>>>> >>>>>> My concern is that changing code and tests at the same time risks >>>>> regressions… >>>>> >>>>> >>>>> >>>>> I’ve never heard this position before. Would you care to elaborate? It is >>>>> quite normal for us to update tests alongside changes to the code. >>>>> >>>>>> And seconding Benjamin's comments… some documentation on how to write a >>>>> test, and a simple test example, that this CEP then allows us to write >>>>> would help a lot (a la "working backwards"). >>>>> >>>>> 1) This work is to _enable_ the development of tests, with the only test >>>>> originally planned to arrive alongside it the fairly sophisticated LWT >>>>> Verifier. This is something we have sorely needed as a project, as we >>>> have >>>>> had serious correctness violations for multiple years. This broad >>>> category >>>>> of integrated test for verifying correctness is the main goal of the work >>>>> and is not easily condensed into an example snippet. >>>>> 2) It is _possible_ that some simple and fluid APIs will be introduced in >>>>> a later phase of this work, but they haven’t been designed yet, so I >>>> cannot >>>>> share snippets. >>>>> >>>>> In principle, however, you would be able to do something like: >>>>> >>>>> @Nemesis volatile int x = 0; >>>>> int foo() { >>>>> x = x + 1; >>>>> return x; >>>>> } >>>>> >>>>> @Test >>>>> void test() { >>>>> Future<?> f1 = executor.submit(() -> foo()); >>>>> Future<?> f2 = executor.submit(() -> foo()); >>>>> Assert.assertTrue(f1.get() == 1 || f2.get() == 1); >>>>> } >>>>> >>>>> >>>>> From: Mick Semb Wever <m...@apache.org> >>>>> Date: Tuesday, 13 July 2021 at 10:28 >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>>>> >>>>>> To achieve this, significant modifications will be required to the >>>>> codebase, mostly cleaning up existing abstractions. Specifically, we will >>>>> need to be able to mock executors, any blocking concurrency primitives, >>>>> time, filesystem access and internode streaming. >>>>>> >>>>>> The work is – in large part – already complete, with JIRA and PRs to >>>>> follow in the coming weeks. Of course, the work is subject to the usual >>>>> community input and review, so this does not preclude changes to the work >>>>> (even significant ones, if they are warranted). I know a lot of incoming >>>>> CEP are likely to be backed up by significant off-list development as a >>>>> result of the focus on a shippable 4.0. Hopefully this is just a >>>> temporary >>>>> growing pain, particularly as we move towards a shippable trunk. >>>>>> >>>>>> I hope this work will be of huge value to the project, particularly as >>>>> we race to catch up on years of limited feature development. >>>>>> >>>>>> JIRA and PRs will follow, but I wanted to kick-off discussion in >>>> advance. >>>>>> >>>>> >>>>> >>>>> >>>>> Should target release be 4.1. (not 4.0.x) ? >>>>> >>>>> I'd be interested in seeing a rough timeline/plan of how the proposed >>>>> changes are to be defined in JIRAs and ordered. >>>>> >>>>> I'd like to hear a bit more about the test plan. Not so much about how >>>>> the CEP itself improves testability of the project, but for example >>>>> the testing required to be in place to introduce the changes of the >>>>> CEP (and if it already exists, where). My concern is that changing >>>>> code and tests at the same time risks regressions… >>>>> >>>>> And seconding Benjamin's comments… some documentation on how to write >>>>> a test, and a simple test example, that this CEP then allows us to >>>>> write would help a lot (a la "working backwards"). >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org