Re: [DISCUSS] CEP-10: Cluster and Code Simulations

Benjamin Lerer Tue, 13 Jul 2021 07:26:15 -0700

>
> "Where do we do that?" is a more tricky question.


Sorry, I was not really clear with that comment. What I was wondering is if
we should create a minor version to address that issue (e.g. 4.1).

I am also against making the change in the 4.0 branch.

Le mar. 13 juil. 2021 à 16:09, bened...@apache.org <bened...@apache.org> a
écrit :

> My point is that we all have different premises we are working from. I
> don’t think you can convince me that I am mistaken about how I interpret
> the word feature. The release lifecycle document we voted on is ambiguous,
> and we all clearly take it to mean different things.
>
> From: Jeremiah D Jordan <jeremiah.jor...@gmail.com>
> Date: Tuesday, 13 July 2021 at 15:06
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> Just because it is a feature for users who are developers does not mean it
> is not a new feature?  Adding this capability is adding new functionality
> to what developers can do with Apache Cassandra.  How is that not a new
> feature?
>
> Semver has been brought up a lot in conversations around what can go
> where.  If we look at how semver defines such things:
>
> MAJOR version when you make incompatible API changes,
> MINOR version when you add functionality in a backwards compatible manner,
> and
> PATCH version when you make backwards compatible bug fixes.
>
> This change to me sounds like 2.  Adding new functionality in a backwards
> compatible manner.  I guess our issue here is that we have never actually
> done MINOR releases in the C* project, we only make MAJOR releases and
> PATCH releases.  So we need to decide where things that in semver would go
> in a MINOR version should go.  In my mind it was always that such things
> should only go to a MAJOR, as it seems less safe to relax what goes in a
> PATCH and allow them there.
>
> -Jeremiah
>
> > On Jul 13, 2021, at 8:47 AM, bened...@apache.org wrote:
> >
> >> I do think adding the ability to do “Cluster and Code Simulations” is a
> new feature.
> >
> > I don’t. I understand a feature to be a user-visible change, such as new
> functionality, and it was on this basis I endorsed the release lifecycle
> document. I do not believe that all improvement should stop to patch
> releases, as I do not believe this produces the highest quality outcome.
> >
> >
> >
> >
> > From: Jeremiah D Jordan <jerem...@datastax.com>
> > Date: Tuesday, 13 July 2021 at 14:41
> > To: Cassandra DEV <dev@cassandra.apache.org>
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > I do not think fixing CASSANDRA-12126 is not a new feature.  I do think
> adding the ability to do “Cluster and Code Simulations” is a new feature.
> >
> > -Jeremiah
> >
> >> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
> >>
> >> Nothing we’re discussing constitutes a feature. We’re discussing
> stability enhancements, and important bug fixes.
> >>
> >> I think this disagreement is to some extent founded on our different
> premises about what a patch release should contain, and this seems to be
> the fault of incompletely specified documentation.
> >>
> >> 1. The release lifecycle only forbids feature work from being developed
> in a patch release, and only expressly includes bug fixes. Note that, the
> document even has a comment by the author suggesting that features may be
> backported to a patch release from trunk (not something I agree with, but
> it demonstrates the ambiguity of the definition).
> >> 2. There seems to be some conflation of size-of-change with the
> admissibility wrt release lifecycle – I don’t think there’s any criteria
> here, and it’s open to the community’s case-by-case assessment. Whatever we
> do to fix the bug in question will necessarily be a very significant piece
> of work itself, for instance.
> >>
> >> My interpretation of the release lifecycle document is that it is
> acceptable to include this work in a patch release. My belief about its
> impact is that it would contribute positively to the stability of the
> project’s 4.0 releases over the lifecycle, and improve project velocity.
> >>
> >> With respect to whether we can ship a fix to 12126 without validation,
> I would be strongly opposed to this, and certainly would not produce a
> patch myself in this way. Not only would it be burdensome (given the
> divergences in the codebase), but I would not consider it acceptably safe
> (given the divergence).
> >>
> >>
> >> From: Jeremiah D Jordan <jeremiah.jor...@gmail.com>
> >> Date: Tuesday, 13 July 2021 at 14:15
> >> To: Cassandra DEV <dev@cassandra.apache.org>
> >> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> >> I tend to agree with Paulo that a major refactoring of some internal
> interfaces sounds like something to be explicitly avoided in a patch
> release.  I thought this was the type of change we all agreed we should
> stop letting in to patch releases, and that we would attempt to release
> more often (once a year) so changes that only go to trunk would get out
> faster?  Are we really wanting to break that promise to ourselves before we
> even release 4.0?  To me “I think we need this feature released faster” is
> not a reason to put it in 4.0, it could be a reason to release 4.1 sooner.
> This is where having a releasable trunk helps, as if we decided as a
> project that some change was worth a new major being released early the
> effort of doing that release is much smaller when trunk is releasable.
> >>
> >> Any fix we make in 4.0 would be merged forward into trunk and could be
> fully verified there?  Probably not the best, but would give more
> confidence in a fix than otherwise without adding other major changes to
> 4.0?
> >>
> >> -Jeremiah
> >>
> >>> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer <b.le...@gmail.com> wrote:
> >>>
> >>>>
> >>>> Furthermore, we introduced a significant performance regression in all
> >>>> lines of the software by increasing the number of LWT round-trips.
> Unless
> >>>> we intend to leave this regression for a further year without _any_
> release
> >>>> offering a solution, we will need suitable verification mechanisms for
> >>>> whatever fixes we deliver.
> >>>>
> >>>> My view is that it is unacceptable to leave such a significant
> regression
> >>>> unaddressed in all lines of software we intend to release for the
> >>>> foreseeable future.
> >>>
> >>>
> >>> I would like to expand a bit on this as I believe it might be
> important for
> >>> people to have the full picture. The fix for  CASSANDRA-12126
> >>> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
> >>> regression by increasing the number of LWT round-trips. Nevertheless,
> the
> >>> patch introduced a flag to allow users to revert to the previous
> behavior
> >>> (previous performance + consistency issue).
> >>>
> >>> Also the patch did not address all paxos consistency issues. There are
> >>> still some issues during topologie changes (may be in some other
> scenarios).
> >>>
> >>> My understanding of Benedict's proposal is to fix paxos once and for
> all
> >>> without any performance regression.
> >>>
> >>> That goal makes total sense to me. "Where do we do that?" is a more
> tricky
> >>> question.
> >>>
> >>> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org <
> bened...@apache.org> a
> >>> écrit :
> >>>
> >>>> Hmm. It occurs to me I’m not entirely sure how our new release
> process is
> >>>> going to work.
> >>>>
> >>>> Will we be releasing 4.1 builds immediately, as part of shippable
> trunk?
> >>>> Or will 4.0 be our only active line of software for the next year?
> >>>>
> >>>> Either way, I bet my bottom dollar there will come some regret if we
> >>>> introduce such divergence between the two most active branches we
> maintain,
> >>>> so early in their lifecycles. If we invest significant resources in
> >>>> improved testing using this framework (which I very much expect) then
> >>>> branches that are not compatible will not benefit, likely reducing
> their
> >>>> quality; and the risk of backports will increase, due to divergence.
> >>>>
> >>>> Altogether, I think it would be a huge mistake. But if we will be
> shipping
> >>>> releases soon that can fix these aforementioned regressions, I won’t
> >>>> campaign for it.
> >>>>
> >>>>
> >>>>
> >>>> From: bened...@apache.org <bened...@apache.org>
> >>>> Date: Tuesday, 13 July 2021 at 13:31
> >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> >>>> No change is without risk; we have introduced serious regressions
> with bug
> >>>> fixes to patch releases. The overall risk to the release lifecycle is
> >>>> reduced significantly in my opinion, as we reduce the likelihood of
> >>>> introducing regressions, and can use the same test infrastructure
> across
> >>>> all of the actively developed releases, increasing our confidence in
> 4.0.x
> >>>> releases.
> >>>>
> >>>> Furthermore, we introduced a significant performance regression in all
> >>>> lines of the software by increasing the number of LWT round-trips.
> Unless
> >>>> we intend to leave this regression for a further year without _any_
> release
> >>>> offering a solution, we will need suitable verification mechanisms for
> >>>> whatever fixes we deliver.
> >>>>
> >>>> My view is that it is unacceptable to leave such a significant
> regression
> >>>> unaddressed in all lines of software we intend to release for the
> >>>> foreseeable future.
> >>>>
> >>>>
> >>>> From: Paulo Motta <pauloricard...@gmail.com>
> >>>> Date: Tuesday, 13 July 2021 at 13:21
> >>>> To: Cassandra DEV <dev@cassandra.apache.org>
> >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> >>>>> No, in my opinion the target should be 4.0.x. We are reaching for a
> >>>> shippable trunk and this has no public API impacts. This work is IMO
> >>>> central to achieving a shippable trunk, either way. The only reason I
> do
> >>>> not target 3.x is that it would be too burdensome.
> >>>>
> >>>> In my limited view of the proposal, a major refactor of internal
> >>>> concurrency APIs to support the testing facility potentially risks the
> >>>> stability of a minor release, something we've been wanting to avoid
> with
> >>>> our focus on stability. So I'd prefer this to go in  trunk/4.1,
> otherwise
> >>>> we will create precedence to including non-bugfix changes in minor
> >>>> versions, something I think we should avoid.
> >>>>
> >>>> In the past we've been lenient to including seemingly harmless
> internal
> >>>> changes that caused client impact and we should be careful to avoid
> this in
> >>>> the future. To prevent this I think we should take a strict approach
> and
> >>>> only accept bug fixes in minor (ie. 4.0.x) versions moving forward.
> >>>>
> >>>> I'd go one step further and propose that any CEPs, which are generally
> >>>> about new features, major API changes or internal refactorings,
> should only
> >>>> be allowed in subsequent major versions, unless an explicit exception
> is
> >>>> granted.
> >>>>
> >>>> Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org <
> >>>> bened...@apache.org> escreveu:
> >>>>
> >>>>> Perhaps it’s worth looking forward at the roadmap that we plan to
> >>>> develop,
> >>>>> and consider whether such a facility would be welcome for proving
> their
> >>>>> safety, and we can then worry about evolving the specifics of any
> API(s)
> >>>>> together as we deploy the capability? Looking ahead, there are very
> few
> >>>>> major features I wouldn’t want to see exercised with this approach,
> given
> >>>>> the choice.
> >>>>>
> >>>>> The LWT Verifier by itself is an integration test that covers many
> of the
> >>>>> affected subsystems, including sstables, memtables and repair. But we
> >>>> will
> >>>>> have the ability to introduce dedicated verification for each of
> these
> >>>>> features and systems, and we will necessarily produce more robust
> code
> >>>>> (repair is a great example of a brittle system that would be
> impossible
> >>>> to
> >>>>> produce with such an adversarial test system)
> >>>>>
> >>>>>
> >>>>> *Query side improvements:*
> >>>>>
> >>>>> * Storage Attached Index or SAI. The CEP can be found at
> >>>>>
> >>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
> >>>>> * Add support for OR predicates in the CQL where clause
> >>>>> * Allow to aggregate by time intervals (CASSANDRA-11871) and allow
> UDFs
> >>>>> in GROUP BY clause
> >>>>> * Ability to read the TTL and WRITE TIME of an element in a
> collection
> >>>>> (CASSANDRA-8877)
> >>>>> * Multi-Partition LWTs
> >>>>> * Materialized views hardening: Addressing the different Materialized
> >>>>> Views issues (see CASSANDRA-15921 and [1] for some of the work
> involved)
> >>>>>
> >>>>> *Security improvements:*
> >>>>>
> >>>>> * SSTables encryption (CASSANDRA-9633)
> >>>>> * Add support for Dynamic Data Masking (CEP pending)
> >>>>> * Allow the creation of roles that have the ability to assign
> arbitrary
> >>>>> privileges, or scoped privileges without also granting those roles
> access
> >>>>> to database objects.
> >>>>> * Filter rows from system and system_schema based on users
> permissions
> >>>>> (CASSANDRA-15871)
> >>>>>
> >>>>> *Performance improvements:*
> >>>>>
> >>>>> * Trie-based index format (CEP pending)
> >>>>> * Trie-based memtables (CEP pending)
> >>>>> * Paxos improvements: Paxos / LWT implementation that would enable
> the
> >>>>> database to serve serial writes with two round-trips and serial reads
> >>>> with
> >>>>> one round-trip in the uncontended case
> >>>>>
> >>>>> *Safety/Usability improvements:*
> >>>>>
> >>>>> * Guardrails. The CEP can be found at
> >>>>>
> >>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
> >>>>> * Add ability to track state in repair (CASSANDRA-15399)
> >>>>> * Repair coordinator improvements (CASSANDRA-15399)
> >>>>> * Make incremental backup configurable per keyspace and table
> >>>>> (CASSANDRA-15402)
> >>>>> * Add ability to blacklist a CQL partition so all requests are
> ignored
> >>>>> (CASSANDRA-12106)
> >>>>> * Add default and required keyspace replication options
> >>>> (CASSANDRA-14557)
> >>>>> * Transactional Cluster Metadata: Use of transactions to propagate
> >>>>> cluster metadata
> >>>>> * Downgrade-ability: Ability to downgrade to downgrade in the event
> >>>> that
> >>>>> a serious issue has been identified
> >>>>>
> >>>>> *Pluggability improvements:*
> >>>>>
> >>>>> * Pluggable schema manager (CEP pending)
> >>>>> * Pluggable filesystem (CEP pending)
> >>>>> * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft
> can
> >>>> be
> >>>>> found at
> >>>>>
> >>>>>
> >>>>
> https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit
> >>>>> * Memtable API (CEP pending). The goal being to allow improvements
> such
> >>>>> as CASSANDRA-13981 to be easily plugged into Cassandra
> >>>>>
> >>>>> *Memtable pluggable implementation:*
> >>>>>
> >>>>> * Enable Cassandra for Persistent Memory (CASSANDRA-13981)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> From: bened...@apache.org <bened...@apache.org>
> >>>>> Date: Tuesday, 13 July 2021 at 10:51
> >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> >>>>> Ach, editing code in the email editor isn’t smart when editors all
> have
> >>>>> different meanings for key combinations (accidentally hit send), but
> you
> >>>>> get the idea. The simulator would intercept these thread executions,
> the
> >>>>> memory accesses for the annotated field, and evaluate them so that in
> >>>> some
> >>>>> cases the assertions would fail.
> >>>>>
> >>>>> This is obviously a toy example that is not very interesting, but the
> >>>> main
> >>>>> real example we have is too complicated to produce a snippet to
> >>>>> demonstrate. In my view, the long term outcome of this work is
> likely the
> >>>>> enablement of many unit tests that are a little more complicated than
> >>>> this,
> >>>>> on less obvious code.
> >>>>>
> >>>>> But the headline goal of the CEP is not. By itself, the LWT Verifier
> >>>>> demonstrates the power and utility of the work. I don’t believe it is
> >>>>> terribly helpful to focus on secondary justifications like the
> example I
> >>>>> gave. For me, the _ability_ to prove the correctness of difficult but
> >>>>> critical systems is justification enough, whether or not we deliver a
> >>>>> simple API as part of the CEP.
> >>>>>
> >>>>>
> >>>>>
> >>>>> From: bened...@apache.org <bened...@apache.org>
> >>>>> Date: Tuesday, 13 July 2021 at 10:43
> >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> >>>>>> Should target release be 4.1. (not 4.0.x) ?
> >>>>>
> >>>>>
> >>>>>
> >>>>> No, in my opinion the target should be 4.0.x. We are reaching for a
> >>>>> shippable trunk and this has no public API impacts. This work is IMO
> >>>>> central to achieving a shippable trunk, either way. The only reason
> I do
> >>>>> not target 3.x is that it would be too burdensome.
> >>>>>
> >>>>>> My concern is that changing code and tests at the same time risks
> >>>>> regressions…
> >>>>>
> >>>>>
> >>>>>
> >>>>> I’ve never heard this position before. Would you care to elaborate?
> It is
> >>>>> quite normal for us to update tests alongside changes to the code.
> >>>>>
> >>>>>> And seconding Benjamin's comments… some documentation on how to
> write a
> >>>>> test, and a simple test example, that this CEP then allows us to
> write
> >>>>> would help a lot (a la "working backwards").
> >>>>>
> >>>>> 1) This work is to _enable_ the development of tests, with the only
> test
> >>>>> originally planned to arrive alongside it the fairly sophisticated
> LWT
> >>>>> Verifier. This is something we have sorely needed as a project, as we
> >>>> have
> >>>>> had serious correctness violations for multiple years. This broad
> >>>> category
> >>>>> of integrated test for verifying correctness is the main goal of the
> work
> >>>>> and is not easily condensed into an example snippet.
> >>>>> 2) It is _possible_ that some simple and fluid APIs will be
> introduced in
> >>>>> a later phase of this work, but they haven’t been designed yet, so I
> >>>> cannot
> >>>>> share snippets.
> >>>>>
> >>>>> In principle, however, you would be able to do something like:
> >>>>>
> >>>>> @Nemesis volatile int x = 0;
> >>>>> int foo() {
> >>>>>  x = x + 1;
> >>>>>  return x;
> >>>>> }
> >>>>>
> >>>>> @Test
> >>>>> void test() {
> >>>>>  Future<?> f1 = executor.submit(() -> foo());
> >>>>>  Future<?> f2 = executor.submit(() -> foo());
> >>>>>  Assert.assertTrue(f1.get() == 1 || f2.get() == 1);
> >>>>> }
> >>>>>
> >>>>>
> >>>>> From: Mick Semb Wever <m...@apache.org>
> >>>>> Date: Tuesday, 13 July 2021 at 10:28
> >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> >>>>>>
> >>>>>> To achieve this, significant modifications will be required to the
> >>>>> codebase, mostly cleaning up existing abstractions. Specifically, we
> will
> >>>>> need to be able to mock executors, any blocking concurrency
> primitives,
> >>>>> time, filesystem access and internode streaming.
> >>>>>>
> >>>>>> The work is – in large part – already complete, with JIRA and PRs to
> >>>>> follow in the coming weeks. Of course, the work is subject to the
> usual
> >>>>> community input and review, so this does not preclude changes to the
> work
> >>>>> (even significant ones, if they are warranted). I know a lot of
> incoming
> >>>>> CEP are likely to be backed up by significant off-list development
> as a
> >>>>> result of the focus on a shippable 4.0. Hopefully this is just a
> >>>> temporary
> >>>>> growing pain, particularly as we move towards a shippable trunk.
> >>>>>>
> >>>>>> I hope this work will be of huge value to the project, particularly
> as
> >>>>> we race to catch up on years of limited feature development.
> >>>>>>
> >>>>>> JIRA and PRs will follow, but I wanted to kick-off discussion in
> >>>> advance.
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Should target release be 4.1. (not 4.0.x) ?
> >>>>>
> >>>>> I'd be interested in seeing a rough timeline/plan of how the proposed
> >>>>> changes are to be defined in JIRAs and ordered.
> >>>>>
> >>>>> I'd like to hear a bit more about the test plan. Not so much about
> how
> >>>>> the CEP itself improves testability of the project, but for example
> >>>>> the testing required to be in place to introduce the changes of the
> >>>>> CEP (and if it already exists, where). My concern is that changing
> >>>>> code and tests at the same time risks regressions…
> >>>>>
> >>>>> And seconding Benjamin's comments… some documentation on how to write
> >>>>> a test, and a simple test example, that this CEP then allows us to
> >>>>> write would help a lot (a la "working backwards").
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

Reply via email to