> why not implement backwards write compatibility? +1 to this from a philosophical perspective. Keeping prior releases completely in the dark about new release sstable formats is a clean approach, and we should already have the code around to ser/deser the prior version's data on the next version.
On Wed, Feb 22, 2023, at 10:07 AM, Jeff Jirsa wrote: > When people are serious about this requirement, they’ll build the downgrade > equivalents of the upgrade tests and run them automatically, often, so people > understand what the real gap is and when something new makes it break > > Until those tests exist, I think collectively we should all stop pretending > like this is dogma. Best effort is best effort. > > > >> On Feb 22, 2023, at 6:57 AM, Branimir Lambov <branimir.lam...@datastax.com> >> wrote: >> >> > 1. Major SSTable changes should begin with forward-compatibility in a >> > prior release. >> >> This requires "feature" changes, i.e. new non-trivial code for previous >> patch releases. It also entails porting over any further format modification. >> >> Instead of this, in combination with your second point, why not implement >> backwards write compatibility? The opt-in is then clearer to define (i.e. >> upgrades start with e.g. a "4.1-compatible" settings set that includes file >> format compatibility and disabling of new features, new nodes start with >> "current" settings set). When the upgrade completes and the user is happy >> with the result, the settings set can be replaced. >> >> Doesn't this achieve what you want (and we all agree is a worthy goal) with >> much less effort for everyone? Supporting backwards-compatible writing is >> trivial, and we even have a proof-of-concept in the stats metadata >> serializer. It also simplifies by a serious margin the amount of work and >> thinking one has to do when a format improvement is implemented -- e.g. the >> TTL patch can just address this in exactly the way the problem was addressed >> in earlier versions of the format, by capping to 2038, without any need to >> specify, obey or test any configuration flags. >> >> >> It’s a commitment, and it requires every contributor to consider it as >> >> part of work they produce. >> >> > But it shouldn't be a burden. Ability to downgrade is a testable problem, >> > so I see this work as a function of the suite of tests the project is >> > willing to agree on supporting. >> >> I fully agree with this sentiment, and I feel that the current "try to not >> introduce breaking changes" approach is adding the burden, but not the >> benefits -- because the latter cannot be proven, and are most likely already >> broken. >> >> Regards, >> Branimir >> >> On Wed, Feb 22, 2023 at 1:01 AM Abe Ratnofsky <a...@aber.io> wrote: >>> Some interesting existing work on this subject is "Understanding and >>> Detecting Software Upgrade Failures in Distributed Systems" - >>> https://dl.acm.org/doi/10.1145/3477132.3483577 >>> <https://urldefense.com/v3/__https://dl.acm.org/doi/10.1145/3477132.3483577__;!!PbtH5S7Ebw!ZUMhWOKjMaK62HKCGLYN0rAhZbbX8fOJkgCsfMgjYO5EgJQulefcb5pwH4q5oU5ylLl6W56W-NWm0FLO7w$>, >>> also summarized by Andrey Satarin here: >>> https://asatarin.github.io/talks/2022-09-upgrade-failures-in-distributed-systems/ >>> >>> <https://urldefense.com/v3/__https://asatarin.github.io/talks/2022-09-upgrade-failures-in-distributed-systems/__;!!PbtH5S7Ebw!ZUMhWOKjMaK62HKCGLYN0rAhZbbX8fOJkgCsfMgjYO5EgJQulefcb5pwH4q5oU5ylLl6W56W-NUfWWwFsA$> >>> >>> They specifically tested Cassandra upgrades, and have a solid list of >>> defects that they found. They also describe their testing mechanism >>> DUPTester, which includes a component that confirms that the leftover state >>> from one version can start up on the next version. There is a wider scope >>> of upgrade defects highlighted in the paper, beyond SSTable version support. >>> >>> I believe the project would benefit from expanding our test suite >>> similarly, by parametrizing more tests on upgrade version pairs. >>> >>> Also, per Benedict's comment: >>> >>> > It’s a commitment, and it requires every contributor to consider it as >>> > part of work they produce. >>> >>> But it shouldn't be a burden. Ability to downgrade is a testable problem, >>> so I see this work as a function of the suite of tests the project is >>> willing to agree on supporting. >>> >>> Specifically - I agree with Scott's proposal to emulate the HDFS >>> upgrade-then-finalize approach. I would also support automatic finalization >>> based on a time threshold or similar, to balance the priorities of safe and >>> straightforward upgrades. Users need to be aware of the range of SSTable >>> formats supported by a given version, and how to handle when their SSTables >>> wouldn't be supported by an upcoming upgrade. >>> >>> -- >>> Abe >> >> >> -- >> Branimir Lambov >> e. branimir.lam...@datastax.com >> w. www.datastax.com >>