Isn't the plan to change LWT implementation (and performance expectation) in a patch version? This is a breaking change by itself, I'm just proposing to make the trade-off choice explicit in the yaml to prevent unexpected performance degradation during upgrade (for users who are not aware of the change).
Just to make it clear, I'm proposing having a "lwt_legacy_mode: false" uncommented in the default yaml with a descriptive comment about CASSANDRA-12126, so new users will always get the new behavior, but users using a yaml template based on a previous 3.X version will not be able to start the node because this property will be missing. I believe the majority of operators will just update their yaml with "lwt_legacy_mode: false" and move on with their upgrades, but people wanting to keep the previous performance will become aware of the breaking change and set it to true. Em seg., 23 de nov. de 2020 às 21:07, Benedict Elliott Smith < bened...@apache.org> escreveu: > What do you mean by minor upgrade? We can't break patch upgrades for any > of 3.x, as this could also cause surprise outages. > > On 23/11/2020, 23:51, "Paulo Motta" <pauloricard...@gmail.com> wrote: > > I was thinking about the YAML requirement during the 3.X minor > upgrade to > make the decision explicit (need to update yaml) rather than implicit > (by > upgrading you agree with the change), since the latter can go > unnoticed by > those who don't pay attention to NEWS.txt > > Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith < > bened...@apache.org> escreveu: > > > What's the value of the yaml? The user is likely to have upgraded to > > latest 3.x as part of the upgrade process to 4.0, so they'll already > have > > had a decision made for them. If correctness didn't break anything, > there > > doesn't any longer seem much point in offering a choice? > > > > On 23/11/2020, 22:45, "Brandon Williams" <dri...@gmail.com> wrote: > > > > +1 to both as well. > > > > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston > > <beggles...@apple.com.invalid> > > wrote: > > > > > +1 to correctness, and I like the yaml idea > > > > > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta < > pauloricard...@gmail.com > > > > > > wrote: > > > > > > > > +1 to defaulting for correctness. > > > > > > > > In addition to that, how about making it a mandatory > cassandra.yaml > > > > property defaulting to correctness? This would make upgrades > with > > an old > > > > cassandra.yaml fail unless an option is explicitly specified, > > making > > > > operators aware of the issue and forcing them to make a > choice. > > > > > > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < > > > >> benjamin.le...@datastax.com> escreveu: > > > >> > > > >> Thank you very much to everybody that provided feedback. It > > helped a > > > lot to > > > >> limit our options. > > > >> > > > >> Unfortunately, it seems that some poor soul (me, really!!!) > will > > have to > > > >> make the final call between #3 and #4. > > > >> > > > >> If I reformulate the question to: Do we default to > *correctness > > *or to > > > >> *performance*? > > > >> > > > >> I would choose to default to *correctness*. > > > >> > > > >> Of course the situation is more complex than that but it > seems > > that > > > >> somebody has to make a call and live with it. It seems to > me that > > being > > > >> blamed for choosing correctness is easier to live with ;-) > > > >> > > > >> Benjamin > > > >> > > > >> PS: I tried to push the choice on Sylvain but he dodged the > > bullet. > > > >> > > > >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < > > > >> bened...@apache.org> > > > >> wrote: > > > >> > > > >>> I think I meant #4 __♂️ > > > >>> > > > >>> On 20/11/2020, 21:11, "Blake Eggleston" > > <beggles...@apple.com.INVALID > > > > > > > >>> wrote: > > > >>> > > > >>> I’d also prefer #3 over #4 > > > >>> > > > >>>> On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < > > > >>> bened...@apache.org> wrote: > > > >>>> > > > >>>> Well, I expressed a preference for #3 over #4, > particularly for > > > >> the > > > >>> 3.x series. However at this point, I think the lack of a > clear > > project > > > >>> decision means we can punt it back to you and Sylvain to > make > > the final > > > >>> call. > > > >>>> > > > >>>> On 20/11/2020, 16:23, "Benjamin Lerer" < > > > >> benjamin.le...@datastax.com> > > > >>> wrote: > > > >>>> > > > >>>> I will try to summarize the discussion to clarify the > outcome. > > > >>>> > > > >>>> Mick is in favor of #4 > > > >>>> Summanth is in favor of #4 > > > >>>> Sylvain answer was not clear for me. I understood it > like I > > > >>> prefer #3 to #4 > > > >>>> and I am also fine with #1 > > > >>>> Jeff is in favor of #3 and will understand #4 > > > >>>> David is in favor #3 (fix bug and add flag to roll back > to old > > > >>> behavior) in > > > >>>> 4.0 and #4 in 3.0 and 3.11 > > > >>>> > > > >>>> Do not hesitate to correct me if I misunderstood your > answer. > > > >>>> > > > >>>> Based on these answers it seems clear that most people > prefer > > to > > > >>> go for #3 > > > >>>> or #4. > > > >>>> > > > >>>> The choice between #3 (fix correctness opt-in to current > > > >>> behavior) and #4 > > > >>>> (current behavior opt-in to correctness) is a bit less > clear > > > >>> specially if > > > >>>> we consider the 3.X branches or 4.0. > > > >>>> > > > >>>> Does anybody as some idea on how to choose between > those 2 > > > >>> choices or some > > > >>>> extra opinions on #3 versus #4? > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>>> On Wed, Nov 18, 2020 at 9:45 PM David Capwell < > > > >>> dcapw...@gmail.com> wrote: > > > >>>>> > > > >>>>> I feel that #4 (fix bug and add flag to roll back to old > > behavior) > > > >>> is best. > > > >>>>> > > > >>>>> About the alternative implementation, I am fine adding > it to > > 3.x > > > >>> and 4.0, > > > >>>>> but should treat it as a different path disabled by > default > > that > > > >>> you can > > > >>>>> opt-into, with a plan to opt-in by default "eventually". > > > >>>>> > > > >>>>> On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < > > > >>>>> bened...@apache.org> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Perhaps there might be broader appetite to weigh in on > which > > > >> major > > > >>>>>> releases we might target for work that fixes the > correctness > > bug > > > >>> without > > > >>>>>> serious performance regression? > > > >>>>>> > > > >>>>>> i.e., if we were to fix the correctness bug now, > introducing a > > > >>> serious > > > >>>>>> performance regression (either opt-in or opt-out), but > were to > > > >>> land work > > > >>>>>> without this problem for 5.0, would there be appetite to > > backport > > > >>> this > > > >>>>> work > > > >>>>>> to any of 4.0, 3.11 or 3.0? > > > >>>>>> > > > >>>>>> > > > >>>>>> On 18/11/2020, 18:31, "Jeff Jirsa" <jji...@gmail.com> > wrote: > > > >>>>>> > > > >>>>>> This is complicated and relatively few people on earth > > > >>> understand it, > > > >>>>>> so > > > >>>>>> having little feedback is mostly expected, > unfortunately. > > > >>>>>> > > > >>>>>> My normal emotional response is "correctness is > required, > > > >>> opt-in to > > > >>>>>> performance improvements that sacrifice strict > correctness", > > > >>> but I'm > > > >>>>>> also > > > >>>>>> sure this is going to surprise people, and would > understand > > / > > > >>> accept > > > >>>>> #4 > > > >>>>>> (default to current, opt-in to correct). > > > >>>>>> > > > >>>>>> > > > >>>>>> On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott > Smith < > > > >>>>>> bened...@apache.org> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> It doesn't seem like there's much enthusiasm for any > of the > > > >>> options > > > >>>>>>> available here... > > > >>>>>>> > > > >>>>>>> On 12/11/2020, 14:37, "Benedict Elliott Smith" < > > > >>>>> bened...@apache.org > > > >>>>>>> > > > >>>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Is the new implementation a separate, distinctly > modularized > > > >>>>>> new > > > >>>>>>> body of work > > > >>>>>>> > > > >>>>>>> It’s primarily a distinct, modularised and new body > of > > work, > > > >>>>>> however > > > >>>>>>> there is some shared code that has been modified - > namely > > > >>>>>> PaxosState, in > > > >>>>>>> which legacy code is maintained but modified for > > compatibility, > > > >>> and > > > >>>>>> the > > > >>>>>>> system.paxos table (which receives a new column, and > slightly > > > >>>>>> modified > > > >>>>>>> serialization code). It is conceptually an optimised > > version of > > > >>>>> the > > > >>>>>>> existing algorithm. > > > >>>>>>> > > > >>>>>>> If there's a chance of being of value to 4.0, I can > try to > > > >> put > > > >>>>>> up a > > > >>>>>>> patch next week alongside a high level description of > the > > > >> changes. > > > >>>>>>> > > > >>>>>>>> But a performance regression is a regression, I'm not > > > >>>>>> shrugging it > > > >>>>>>> off. > > > >>>>>>> > > > >>>>>>> I don't want to give the impression I'm shrugging > off the > > > >>>>>> correctness > > > >>>>>>> issue either. It's a serious issue to fix, but since > all > > > >>> successful > > > >>>>>> updates > > > >>>>>>> to the database are linearizable, I think it's likely > that > > many > > > >>>>>>> applications behave correctly with the present > semantics, or > > at > > > >>>>> least > > > >>>>>>> encounter only transient errors. No doubt many also do > not, > > but > > > >> I > > > >>>>>> have no > > > >>>>>>> idea of the ratio. > > > >>>>>>> > > > >>>>>>> The regression isn't itself a simple issue either - > > depending > > > >>>>> on > > > >>>>>> the > > > >>>>>>> topology and message latencies it is not difficult to > produce > > > >>>>>> inescapable > > > >>>>>>> contention, i.e. guaranteed timeouts - that might > persist as > > > >> long > > > >>>>> as > > > >>>>>>> clients continue to retry. It could be quite a serious > > > >> degradation > > > >>>>> of > > > >>>>>>> service to impose on our users. > > > >>>>>>> > > > >>>>>>> I don't pretend to know the correct way to make a > decision > > > >>>>>> balancing > > > >>>>>>> these considerations, but I am perhaps more concerned > about > > > >>>>> imposing > > > >>>>>>> service outages than I am temporarily maintaining > semantics > > our > > > >>>>>> users have > > > >>>>>>> apparently accepted for years - though I absolutely > share > > your > > > >>>>>>> embarrassment there. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On 12/11/2020, 12:41, "Joshua McKenzie" < > > > >> jmcken...@apache.org > > > >>>>>> > > > >>>>>> wrote: > > > >>>>>>> > > > >>>>>>> Is the new implementation a separate, distinctly > > > >>>>> modularized > > > >>>>>> new > > > >>>>>>> body of > > > >>>>>>> work or does it make substantial changes to > existing > > > >>>>>>> implementation and > > > >>>>>>> subsume it? > > > >>>>>>> > > > >>>>>>> On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne > < > > > >>>>>>> lebre...@gmail.com> wrote: > > > >>>>>>> > > > >>>>>>>> Regarding option #4, I'll remark that experience > tends to > > > >>>>>>> suggest users > > > >>>>>>>> don't consistently read the `NEWS.txt` file on > upgrade, > > > >>>>> so > > > >>>>>>> option #4 will > > > >>>>>>>> likely essentially mean "LWT has a correctness issue, > but > > > >>>>>> once > > > >>>>>>> it broke > > > >>>>>>>> your data enough that you'll notice, you'll be able to > > > >>>>> dig > > > >>>>>> the > > > >>>>>>> proper flag > > > >>>>>>>> to fix it for next time". I guess it's better than > > > >>>>>> nothing, of > > > >>>>>>> course, but > > > >>>>>>>> I'll admit that defaulting to "opt-in correctness", > > > >>>>>> especially > > > >>>>>>> for a > > > >>>>>>>> feature (LWT) that exists uniquely to provide > additional > > > >>>>>>> guarantees, is > > > >>>>>>>> something I have a hard rallying behind. > > > >>>>>>>> > > > >>>>>>>> But a performance regression is a regression, I'm not > > > >>>>>> shrugging > > > >>>>>>> it off. > > > >>>>>>>> Still, I feel we shouldn't leave LWT with a fairly > > > >>>>> serious > > > >>>>>> known > > > >>>>>>>> correctness bug and I frankly feel bad for "the > project" > > > >>>>>> that > > > >>>>>>> this has been > > > >>>>>>>> known for so long without action, so I'm a bit biased > in > > > >>>>>> wanting > > > >>>>>>> to get it > > > >>>>>>>> fixed asap. > > > >>>>>>>> > > > >>>>>>>> But maybe I'm overstating the urgency here, and maybe > > > >>>>>> option #1 > > > >>>>>>> is a better > > > >>>>>>>> way forward. > > > >>>>>>>> > > > >>>>>>>> -- > > > >>>>>>>> Sylvain > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>> > > --------------------------------------------------------------------- > > > >>>>>>> To unsubscribe, e-mail: > > dev-unsubscr...@cassandra.apache.org > > > >>>>>>> For additional commands, e-mail: > > > >> dev-h...@cassandra.apache.org > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>> > > > >>> > > --------------------------------------------------------------------- > > > >>>>>>> To unsubscribe, e-mail: > dev-unsubscr...@cassandra.apache.org > > > >>>>>>> For additional commands, e-mail: > > dev-h...@cassandra.apache.org > > > >>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>> > > --------------------------------------------------------------------- > > > >>>>>> To unsubscribe, e-mail: > dev-unsubscr...@cassandra.apache.org > > > >>>>>> For additional commands, e-mail: > > dev-h...@cassandra.apache.org > > > >>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >> > > --------------------------------------------------------------------- > > > >>>> To unsubscribe, e-mail: > dev-unsubscr...@cassandra.apache.org > > > >>>> For additional commands, e-mail: > dev-h...@cassandra.apache.org > > > >>>> > > > >>> > > > >>> > > > > --------------------------------------------------------------------- > > > >>> To unsubscribe, e-mail: > dev-unsubscr...@cassandra.apache.org > > > >>> For additional commands, e-mail: > > dev-h...@cassandra.apache.org > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > --------------------------------------------------------------------- > > > >>> To unsubscribe, e-mail: > dev-unsubscr...@cassandra.apache.org > > > >>> For additional commands, e-mail: > dev-h...@cassandra.apache.org > > > >>> > > > >>> > > > >> > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >