> > "Where do we do that?" is a more tricky question.
Sorry, I was not really clear with that comment. What I was wondering is if we should create a minor version to address that issue (e.g. 4.1). I am also against making the change in the 4.0 branch. Le mar. 13 juil. 2021 à 16:09, bened...@apache.org <bened...@apache.org> a écrit : > My point is that we all have different premises we are working from. I > don’t think you can convince me that I am mistaken about how I interpret > the word feature. The release lifecycle document we voted on is ambiguous, > and we all clearly take it to mean different things. > > From: Jeremiah D Jordan <jeremiah.jor...@gmail.com> > Date: Tuesday, 13 July 2021 at 15:06 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > Just because it is a feature for users who are developers does not mean it > is not a new feature? Adding this capability is adding new functionality > to what developers can do with Apache Cassandra. How is that not a new > feature? > > Semver has been brought up a lot in conversations around what can go > where. If we look at how semver defines such things: > > MAJOR version when you make incompatible API changes, > MINOR version when you add functionality in a backwards compatible manner, > and > PATCH version when you make backwards compatible bug fixes. > > This change to me sounds like 2. Adding new functionality in a backwards > compatible manner. I guess our issue here is that we have never actually > done MINOR releases in the C* project, we only make MAJOR releases and > PATCH releases. So we need to decide where things that in semver would go > in a MINOR version should go. In my mind it was always that such things > should only go to a MAJOR, as it seems less safe to relax what goes in a > PATCH and allow them there. > > -Jeremiah > > > On Jul 13, 2021, at 8:47 AM, bened...@apache.org wrote: > > > >> I do think adding the ability to do “Cluster and Code Simulations” is a > new feature. > > > > I don’t. I understand a feature to be a user-visible change, such as new > functionality, and it was on this basis I endorsed the release lifecycle > document. I do not believe that all improvement should stop to patch > releases, as I do not believe this produces the highest quality outcome. > > > > > > > > > > From: Jeremiah D Jordan <jerem...@datastax.com> > > Date: Tuesday, 13 July 2021 at 14:41 > > To: Cassandra DEV <dev@cassandra.apache.org> > > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > > I do not think fixing CASSANDRA-12126 is not a new feature. I do think > adding the ability to do “Cluster and Code Simulations” is a new feature. > > > > -Jeremiah > > > >> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote: > >> > >> Nothing we’re discussing constitutes a feature. We’re discussing > stability enhancements, and important bug fixes. > >> > >> I think this disagreement is to some extent founded on our different > premises about what a patch release should contain, and this seems to be > the fault of incompletely specified documentation. > >> > >> 1. The release lifecycle only forbids feature work from being developed > in a patch release, and only expressly includes bug fixes. Note that, the > document even has a comment by the author suggesting that features may be > backported to a patch release from trunk (not something I agree with, but > it demonstrates the ambiguity of the definition). > >> 2. There seems to be some conflation of size-of-change with the > admissibility wrt release lifecycle – I don’t think there’s any criteria > here, and it’s open to the community’s case-by-case assessment. Whatever we > do to fix the bug in question will necessarily be a very significant piece > of work itself, for instance. > >> > >> My interpretation of the release lifecycle document is that it is > acceptable to include this work in a patch release. My belief about its > impact is that it would contribute positively to the stability of the > project’s 4.0 releases over the lifecycle, and improve project velocity. > >> > >> With respect to whether we can ship a fix to 12126 without validation, > I would be strongly opposed to this, and certainly would not produce a > patch myself in this way. Not only would it be burdensome (given the > divergences in the codebase), but I would not consider it acceptably safe > (given the divergence). > >> > >> > >> From: Jeremiah D Jordan <jeremiah.jor...@gmail.com> > >> Date: Tuesday, 13 July 2021 at 14:15 > >> To: Cassandra DEV <dev@cassandra.apache.org> > >> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > >> I tend to agree with Paulo that a major refactoring of some internal > interfaces sounds like something to be explicitly avoided in a patch > release. I thought this was the type of change we all agreed we should > stop letting in to patch releases, and that we would attempt to release > more often (once a year) so changes that only go to trunk would get out > faster? Are we really wanting to break that promise to ourselves before we > even release 4.0? To me “I think we need this feature released faster” is > not a reason to put it in 4.0, it could be a reason to release 4.1 sooner. > This is where having a releasable trunk helps, as if we decided as a > project that some change was worth a new major being released early the > effort of doing that release is much smaller when trunk is releasable. > >> > >> Any fix we make in 4.0 would be merged forward into trunk and could be > fully verified there? Probably not the best, but would give more > confidence in a fix than otherwise without adding other major changes to > 4.0? > >> > >> -Jeremiah > >> > >>> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer <b.le...@gmail.com> wrote: > >>> > >>>> > >>>> Furthermore, we introduced a significant performance regression in all > >>>> lines of the software by increasing the number of LWT round-trips. > Unless > >>>> we intend to leave this regression for a further year without _any_ > release > >>>> offering a solution, we will need suitable verification mechanisms for > >>>> whatever fixes we deliver. > >>>> > >>>> My view is that it is unacceptable to leave such a significant > regression > >>>> unaddressed in all lines of software we intend to release for the > >>>> foreseeable future. > >>> > >>> > >>> I would like to expand a bit on this as I believe it might be > important for > >>> people to have the full picture. The fix for CASSANDRA-12126 > >>> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a > >>> regression by increasing the number of LWT round-trips. Nevertheless, > the > >>> patch introduced a flag to allow users to revert to the previous > behavior > >>> (previous performance + consistency issue). > >>> > >>> Also the patch did not address all paxos consistency issues. There are > >>> still some issues during topologie changes (may be in some other > scenarios). > >>> > >>> My understanding of Benedict's proposal is to fix paxos once and for > all > >>> without any performance regression. > >>> > >>> That goal makes total sense to me. "Where do we do that?" is a more > tricky > >>> question. > >>> > >>> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org < > bened...@apache.org> a > >>> écrit : > >>> > >>>> Hmm. It occurs to me I’m not entirely sure how our new release > process is > >>>> going to work. > >>>> > >>>> Will we be releasing 4.1 builds immediately, as part of shippable > trunk? > >>>> Or will 4.0 be our only active line of software for the next year? > >>>> > >>>> Either way, I bet my bottom dollar there will come some regret if we > >>>> introduce such divergence between the two most active branches we > maintain, > >>>> so early in their lifecycles. If we invest significant resources in > >>>> improved testing using this framework (which I very much expect) then > >>>> branches that are not compatible will not benefit, likely reducing > their > >>>> quality; and the risk of backports will increase, due to divergence. > >>>> > >>>> Altogether, I think it would be a huge mistake. But if we will be > shipping > >>>> releases soon that can fix these aforementioned regressions, I won’t > >>>> campaign for it. > >>>> > >>>> > >>>> > >>>> From: bened...@apache.org <bened...@apache.org> > >>>> Date: Tuesday, 13 July 2021 at 13:31 > >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > >>>> No change is without risk; we have introduced serious regressions > with bug > >>>> fixes to patch releases. The overall risk to the release lifecycle is > >>>> reduced significantly in my opinion, as we reduce the likelihood of > >>>> introducing regressions, and can use the same test infrastructure > across > >>>> all of the actively developed releases, increasing our confidence in > 4.0.x > >>>> releases. > >>>> > >>>> Furthermore, we introduced a significant performance regression in all > >>>> lines of the software by increasing the number of LWT round-trips. > Unless > >>>> we intend to leave this regression for a further year without _any_ > release > >>>> offering a solution, we will need suitable verification mechanisms for > >>>> whatever fixes we deliver. > >>>> > >>>> My view is that it is unacceptable to leave such a significant > regression > >>>> unaddressed in all lines of software we intend to release for the > >>>> foreseeable future. > >>>> > >>>> > >>>> From: Paulo Motta <pauloricard...@gmail.com> > >>>> Date: Tuesday, 13 July 2021 at 13:21 > >>>> To: Cassandra DEV <dev@cassandra.apache.org> > >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > >>>>> No, in my opinion the target should be 4.0.x. We are reaching for a > >>>> shippable trunk and this has no public API impacts. This work is IMO > >>>> central to achieving a shippable trunk, either way. The only reason I > do > >>>> not target 3.x is that it would be too burdensome. > >>>> > >>>> In my limited view of the proposal, a major refactor of internal > >>>> concurrency APIs to support the testing facility potentially risks the > >>>> stability of a minor release, something we've been wanting to avoid > with > >>>> our focus on stability. So I'd prefer this to go in trunk/4.1, > otherwise > >>>> we will create precedence to including non-bugfix changes in minor > >>>> versions, something I think we should avoid. > >>>> > >>>> In the past we've been lenient to including seemingly harmless > internal > >>>> changes that caused client impact and we should be careful to avoid > this in > >>>> the future. To prevent this I think we should take a strict approach > and > >>>> only accept bug fixes in minor (ie. 4.0.x) versions moving forward. > >>>> > >>>> I'd go one step further and propose that any CEPs, which are generally > >>>> about new features, major API changes or internal refactorings, > should only > >>>> be allowed in subsequent major versions, unless an explicit exception > is > >>>> granted. > >>>> > >>>> Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org < > >>>> bened...@apache.org> escreveu: > >>>> > >>>>> Perhaps it’s worth looking forward at the roadmap that we plan to > >>>> develop, > >>>>> and consider whether such a facility would be welcome for proving > their > >>>>> safety, and we can then worry about evolving the specifics of any > API(s) > >>>>> together as we deploy the capability? Looking ahead, there are very > few > >>>>> major features I wouldn’t want to see exercised with this approach, > given > >>>>> the choice. > >>>>> > >>>>> The LWT Verifier by itself is an integration test that covers many > of the > >>>>> affected subsystems, including sstables, memtables and repair. But we > >>>> will > >>>>> have the ability to introduce dedicated verification for each of > these > >>>>> features and systems, and we will necessarily produce more robust > code > >>>>> (repair is a great example of a brittle system that would be > impossible > >>>> to > >>>>> produce with such an adversarial test system) > >>>>> > >>>>> > >>>>> *Query side improvements:* > >>>>> > >>>>> * Storage Attached Index or SAI. The CEP can be found at > >>>>> > >>>>> > >>>> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index > >>>>> * Add support for OR predicates in the CQL where clause > >>>>> * Allow to aggregate by time intervals (CASSANDRA-11871) and allow > UDFs > >>>>> in GROUP BY clause > >>>>> * Ability to read the TTL and WRITE TIME of an element in a > collection > >>>>> (CASSANDRA-8877) > >>>>> * Multi-Partition LWTs > >>>>> * Materialized views hardening: Addressing the different Materialized > >>>>> Views issues (see CASSANDRA-15921 and [1] for some of the work > involved) > >>>>> > >>>>> *Security improvements:* > >>>>> > >>>>> * SSTables encryption (CASSANDRA-9633) > >>>>> * Add support for Dynamic Data Masking (CEP pending) > >>>>> * Allow the creation of roles that have the ability to assign > arbitrary > >>>>> privileges, or scoped privileges without also granting those roles > access > >>>>> to database objects. > >>>>> * Filter rows from system and system_schema based on users > permissions > >>>>> (CASSANDRA-15871) > >>>>> > >>>>> *Performance improvements:* > >>>>> > >>>>> * Trie-based index format (CEP pending) > >>>>> * Trie-based memtables (CEP pending) > >>>>> * Paxos improvements: Paxos / LWT implementation that would enable > the > >>>>> database to serve serial writes with two round-trips and serial reads > >>>> with > >>>>> one round-trip in the uncontended case > >>>>> > >>>>> *Safety/Usability improvements:* > >>>>> > >>>>> * Guardrails. The CEP can be found at > >>>>> > >>>>> > >>>> > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails > >>>>> * Add ability to track state in repair (CASSANDRA-15399) > >>>>> * Repair coordinator improvements (CASSANDRA-15399) > >>>>> * Make incremental backup configurable per keyspace and table > >>>>> (CASSANDRA-15402) > >>>>> * Add ability to blacklist a CQL partition so all requests are > ignored > >>>>> (CASSANDRA-12106) > >>>>> * Add default and required keyspace replication options > >>>> (CASSANDRA-14557) > >>>>> * Transactional Cluster Metadata: Use of transactions to propagate > >>>>> cluster metadata > >>>>> * Downgrade-ability: Ability to downgrade to downgrade in the event > >>>> that > >>>>> a serious issue has been identified > >>>>> > >>>>> *Pluggability improvements:* > >>>>> > >>>>> * Pluggable schema manager (CEP pending) > >>>>> * Pluggable filesystem (CEP pending) > >>>>> * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft > can > >>>> be > >>>>> found at > >>>>> > >>>>> > >>>> > https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit > >>>>> * Memtable API (CEP pending). The goal being to allow improvements > such > >>>>> as CASSANDRA-13981 to be easily plugged into Cassandra > >>>>> > >>>>> *Memtable pluggable implementation:* > >>>>> > >>>>> * Enable Cassandra for Persistent Memory (CASSANDRA-13981) > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> From: bened...@apache.org <bened...@apache.org> > >>>>> Date: Tuesday, 13 July 2021 at 10:51 > >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > >>>>> Ach, editing code in the email editor isn’t smart when editors all > have > >>>>> different meanings for key combinations (accidentally hit send), but > you > >>>>> get the idea. The simulator would intercept these thread executions, > the > >>>>> memory accesses for the annotated field, and evaluate them so that in > >>>> some > >>>>> cases the assertions would fail. > >>>>> > >>>>> This is obviously a toy example that is not very interesting, but the > >>>> main > >>>>> real example we have is too complicated to produce a snippet to > >>>>> demonstrate. In my view, the long term outcome of this work is > likely the > >>>>> enablement of many unit tests that are a little more complicated than > >>>> this, > >>>>> on less obvious code. > >>>>> > >>>>> But the headline goal of the CEP is not. By itself, the LWT Verifier > >>>>> demonstrates the power and utility of the work. I don’t believe it is > >>>>> terribly helpful to focus on secondary justifications like the > example I > >>>>> gave. For me, the _ability_ to prove the correctness of difficult but > >>>>> critical systems is justification enough, whether or not we deliver a > >>>>> simple API as part of the CEP. > >>>>> > >>>>> > >>>>> > >>>>> From: bened...@apache.org <bened...@apache.org> > >>>>> Date: Tuesday, 13 July 2021 at 10:43 > >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > >>>>>> Should target release be 4.1. (not 4.0.x) ? > >>>>> > >>>>> > >>>>> > >>>>> No, in my opinion the target should be 4.0.x. We are reaching for a > >>>>> shippable trunk and this has no public API impacts. This work is IMO > >>>>> central to achieving a shippable trunk, either way. The only reason > I do > >>>>> not target 3.x is that it would be too burdensome. > >>>>> > >>>>>> My concern is that changing code and tests at the same time risks > >>>>> regressions… > >>>>> > >>>>> > >>>>> > >>>>> I’ve never heard this position before. Would you care to elaborate? > It is > >>>>> quite normal for us to update tests alongside changes to the code. > >>>>> > >>>>>> And seconding Benjamin's comments… some documentation on how to > write a > >>>>> test, and a simple test example, that this CEP then allows us to > write > >>>>> would help a lot (a la "working backwards"). > >>>>> > >>>>> 1) This work is to _enable_ the development of tests, with the only > test > >>>>> originally planned to arrive alongside it the fairly sophisticated > LWT > >>>>> Verifier. This is something we have sorely needed as a project, as we > >>>> have > >>>>> had serious correctness violations for multiple years. This broad > >>>> category > >>>>> of integrated test for verifying correctness is the main goal of the > work > >>>>> and is not easily condensed into an example snippet. > >>>>> 2) It is _possible_ that some simple and fluid APIs will be > introduced in > >>>>> a later phase of this work, but they haven’t been designed yet, so I > >>>> cannot > >>>>> share snippets. > >>>>> > >>>>> In principle, however, you would be able to do something like: > >>>>> > >>>>> @Nemesis volatile int x = 0; > >>>>> int foo() { > >>>>> x = x + 1; > >>>>> return x; > >>>>> } > >>>>> > >>>>> @Test > >>>>> void test() { > >>>>> Future<?> f1 = executor.submit(() -> foo()); > >>>>> Future<?> f2 = executor.submit(() -> foo()); > >>>>> Assert.assertTrue(f1.get() == 1 || f2.get() == 1); > >>>>> } > >>>>> > >>>>> > >>>>> From: Mick Semb Wever <m...@apache.org> > >>>>> Date: Tuesday, 13 July 2021 at 10:28 > >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > >>>>>> > >>>>>> To achieve this, significant modifications will be required to the > >>>>> codebase, mostly cleaning up existing abstractions. Specifically, we > will > >>>>> need to be able to mock executors, any blocking concurrency > primitives, > >>>>> time, filesystem access and internode streaming. > >>>>>> > >>>>>> The work is – in large part – already complete, with JIRA and PRs to > >>>>> follow in the coming weeks. Of course, the work is subject to the > usual > >>>>> community input and review, so this does not preclude changes to the > work > >>>>> (even significant ones, if they are warranted). I know a lot of > incoming > >>>>> CEP are likely to be backed up by significant off-list development > as a > >>>>> result of the focus on a shippable 4.0. Hopefully this is just a > >>>> temporary > >>>>> growing pain, particularly as we move towards a shippable trunk. > >>>>>> > >>>>>> I hope this work will be of huge value to the project, particularly > as > >>>>> we race to catch up on years of limited feature development. > >>>>>> > >>>>>> JIRA and PRs will follow, but I wanted to kick-off discussion in > >>>> advance. > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> Should target release be 4.1. (not 4.0.x) ? > >>>>> > >>>>> I'd be interested in seeing a rough timeline/plan of how the proposed > >>>>> changes are to be defined in JIRAs and ordered. > >>>>> > >>>>> I'd like to hear a bit more about the test plan. Not so much about > how > >>>>> the CEP itself improves testability of the project, but for example > >>>>> the testing required to be in place to introduce the changes of the > >>>>> CEP (and if it already exists, where). My concern is that changing > >>>>> code and tests at the same time risks regressions… > >>>>> > >>>>> And seconding Benjamin's comments… some documentation on how to write > >>>>> a test, and a simple test example, that this CEP then allows us to > >>>>> write would help a lot (a la "working backwards"). > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>> > >>>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org >