Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

Benedict Elliott Smith Wed, 18 Nov 2020 11:10:39 -0800

Perhaps there might be broader appetite to weigh in on which major releases we 
might target for work that fixes the correctness bug without serious 
performance regression?


i.e., if we were to fix the correctness bug now, introducing a serious 
performance regression (either opt-in or opt-out), but were to land work 
without this problem for 5.0, would there be appetite to backport this work to 
any of 4.0, 3.11 or 3.0? 


On 18/11/2020, 18:31, "Jeff Jirsa" <jji...@gmail.com> wrote:

    This is complicated and relatively few people on earth understand it, so
    having little feedback is mostly expected, unfortunately.

    My normal emotional response is "correctness is required, opt-in to
    performance improvements that sacrifice strict correctness", but I'm also
    sure this is going to surprise people, and would understand / accept #4
    (default to current, opt-in to correct).


    On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith <bened...@apache.org>
    wrote:

    > It doesn't seem like there's much enthusiasm for any of the options
    > available here...
    >
    > On 12/11/2020, 14:37, "Benedict Elliott Smith" <bened...@apache.org>
    > wrote:
    >
    >     > Is the new implementation a separate, distinctly modularized new
    > body of work
    >
    >     It’s primarily a distinct, modularised and new body of work, however
    > there is some shared code that has been modified - namely PaxosState, in
    > which legacy code is maintained but modified for compatibility, and the
    > system.paxos table (which receives a new column, and slightly modified
    > serialization code).  It is conceptually an optimised version of the
    > existing algorithm.
    >
    >     If there's a chance of being of value to 4.0, I can try to put up a
    > patch next week alongside a high level description of the changes.
    >
    >     > But a performance regression is a regression, I'm not shrugging it
    > off.
    >
    >     I don't want to give the impression I'm shrugging off the correctness
    > issue either. It's a serious issue to fix, but since all successful 
updates
    > to the database are linearizable, I think it's likely that many
    > applications behave correctly with the present semantics, or at least
    > encounter only transient errors. No doubt many also do not, but I have no
    > idea of the ratio.
    >
    >     The regression isn't itself a simple issue either - depending on the
    > topology and message latencies it is not difficult to produce inescapable
    > contention, i.e. guaranteed timeouts - that might persist as long as
    > clients continue to retry. It could be quite a serious degradation of
    > service to impose on our users.
    >
    >     I don't pretend to know the correct way to make a decision balancing
    > these considerations, but I am perhaps more concerned about imposing
    > service outages than I am temporarily maintaining semantics our users have
    > apparently accepted for years - though I absolutely share your
    > embarrassment there.
    >
    >
    >     On 12/11/2020, 12:41, "Joshua McKenzie" <jmcken...@apache.org> wrote:
    >
    >         Is the new implementation a separate, distinctly modularized new
    > body of
    >         work or does it make substantial changes to existing
    > implementation and
    >         subsume it?
    >
    >         On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne <
    > lebre...@gmail.com> wrote:
    >
    >         > Regarding option #4, I'll remark that experience tends to
    > suggest users
    >         > don't consistently read the `NEWS.txt` file on upgrade, so
    > option #4 will
    >         > likely essentially mean "LWT has a correctness issue, but once
    > it broke
    >         > your data enough that you'll notice, you'll be able to dig the
    > proper flag
    >         > to fix it for next time". I guess it's better than nothing, of
    > course, but
    >         > I'll admit that defaulting to "opt-in correctness", especially
    > for a
    >         > feature (LWT) that exists uniquely to provide additional
    > guarantees, is
    >         > something I have a hard rallying behind.
    >         >
    >         > But a performance regression is a regression, I'm not shrugging
    > it off.
    >         > Still, I feel we shouldn't leave LWT with a fairly serious known
    >         > correctness bug and I frankly feel bad for "the project" that
    > this has been
    >         > known for so long without action, so I'm a bit biased in wanting
    > to get it
    >         > fixed asap.
    >         >
    >         > But maybe I'm overstating the urgency here, and maybe option #1
    > is a better
    >         > way forward.
    >         >
    >         > --
    >         > Sylvain
    >         >
    >
    >
    >
    >     ---------------------------------------------------------------------
    >     To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
    >     For additional commands, e-mail: dev-h...@cassandra.apache.org
    >
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
    > For additional commands, e-mail: dev-h...@cassandra.apache.org
    >
    >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

Reply via email to