How old is the C-12126 surfaced defect? i.e. is this a thing we've had since initial introduction of paxos or is it a regression we introduced somewhere along the way?
On Wed, Nov 11, 2020 at 11:03 AM Benjamin Lerer <benjamin.le...@datastax.com> wrote: > CASSANDRA-12126 addresses one correctness issue of Light Weight > Transactions. Unfortunately, the current patch developed by Sylvain and > Benedict requires an extra round trip between the coordinator and the > replicas for SERIAL and LOCAL_SERIAL reads. > After some experimentations, Benedict discovered that this extra round trip > could lead to a significant increase in timeouts for read-heavy workloads. > > Users for which this behavior is a problem will be able to switch back to > the old behavior using a system property, therefore choosing performance > versus correctness. > > On the side, Benedict has worked on another approach that does not suffer > from that performance problem and also addresses some LWT correctness > issues that can happen when adding or removing nodes. He initially intended > to deliver that improvement in 4.X but can try to incorporate it into 4.0. > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > > 1. Try to use Benedict proposal for 4.0 if the community has the > appetite for it. The main issue there is some potential extra delay for > 4.0 > 2. Do nothing for 4.0. Meaning do not commit the current patch. We have > lived a long time with that issue and we can probably wait a bit more > for a > proper solution. > 3. Commit the patch as such, fixing the correctness but introducing > potentially some performance issue until we release a better solution. > 4. Changing the patch to default to the current behavior but allowing > people to enable the new one if the correctness is a problem for them. > > Thanks in advance for your feedback. >