I feel that #4 (fix bug and add flag to roll back to old behavior) is best.
About the alternative implementation, I am fine adding it to 3.x and 4.0, but should treat it as a different path disabled by default that you can opt-into, with a plan to opt-in by default "eventually". On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith <bened...@apache.org> wrote: > Perhaps there might be broader appetite to weigh in on which major > releases we might target for work that fixes the correctness bug without > serious performance regression? > > i.e., if we were to fix the correctness bug now, introducing a serious > performance regression (either opt-in or opt-out), but were to land work > without this problem for 5.0, would there be appetite to backport this work > to any of 4.0, 3.11 or 3.0? > > > On 18/11/2020, 18:31, "Jeff Jirsa" <jji...@gmail.com> wrote: > > This is complicated and relatively few people on earth understand it, > so > having little feedback is mostly expected, unfortunately. > > My normal emotional response is "correctness is required, opt-in to > performance improvements that sacrifice strict correctness", but I'm > also > sure this is going to surprise people, and would understand / accept #4 > (default to current, opt-in to correct). > > > On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < > bened...@apache.org> > wrote: > > > It doesn't seem like there's much enthusiasm for any of the options > > available here... > > > > On 12/11/2020, 14:37, "Benedict Elliott Smith" <bened...@apache.org > > > > wrote: > > > > > Is the new implementation a separate, distinctly modularized > new > > body of work > > > > It’s primarily a distinct, modularised and new body of work, > however > > there is some shared code that has been modified - namely > PaxosState, in > > which legacy code is maintained but modified for compatibility, and > the > > system.paxos table (which receives a new column, and slightly > modified > > serialization code). It is conceptually an optimised version of the > > existing algorithm. > > > > If there's a chance of being of value to 4.0, I can try to put > up a > > patch next week alongside a high level description of the changes. > > > > > But a performance regression is a regression, I'm not > shrugging it > > off. > > > > I don't want to give the impression I'm shrugging off the > correctness > > issue either. It's a serious issue to fix, but since all successful > updates > > to the database are linearizable, I think it's likely that many > > applications behave correctly with the present semantics, or at least > > encounter only transient errors. No doubt many also do not, but I > have no > > idea of the ratio. > > > > The regression isn't itself a simple issue either - depending on > the > > topology and message latencies it is not difficult to produce > inescapable > > contention, i.e. guaranteed timeouts - that might persist as long as > > clients continue to retry. It could be quite a serious degradation of > > service to impose on our users. > > > > I don't pretend to know the correct way to make a decision > balancing > > these considerations, but I am perhaps more concerned about imposing > > service outages than I am temporarily maintaining semantics our > users have > > apparently accepted for years - though I absolutely share your > > embarrassment there. > > > > > > On 12/11/2020, 12:41, "Joshua McKenzie" <jmcken...@apache.org> > wrote: > > > > Is the new implementation a separate, distinctly modularized > new > > body of > > work or does it make substantial changes to existing > > implementation and > > subsume it? > > > > On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne < > > lebre...@gmail.com> wrote: > > > > > Regarding option #4, I'll remark that experience tends to > > suggest users > > > don't consistently read the `NEWS.txt` file on upgrade, so > > option #4 will > > > likely essentially mean "LWT has a correctness issue, but > once > > it broke > > > your data enough that you'll notice, you'll be able to dig > the > > proper flag > > > to fix it for next time". I guess it's better than > nothing, of > > course, but > > > I'll admit that defaulting to "opt-in correctness", > especially > > for a > > > feature (LWT) that exists uniquely to provide additional > > guarantees, is > > > something I have a hard rallying behind. > > > > > > But a performance regression is a regression, I'm not > shrugging > > it off. > > > Still, I feel we shouldn't leave LWT with a fairly serious > known > > > correctness bug and I frankly feel bad for "the project" > that > > this has been > > > known for so long without action, so I'm a bit biased in > wanting > > to get it > > > fixed asap. > > > > > > But maybe I'm overstating the urgency here, and maybe > option #1 > > is a better > > > way forward. > > > > > > -- > > > Sylvain > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >