When people are serious about this requirement, they’ll build the downgrade equivalents of the upgrade tests and run them automatically, often, so people understand what the real gap is and when something new makes it break 

Until those tests exist, I think collectively we should all stop pretending like this is dogma. Best effort is best effort. 



On Feb 22, 2023, at 6:57 AM, Branimir Lambov <branimir.lam...@datastax.com> wrote:


> 1. Major SSTable changes should begin with forward-compatibility in a prior release.

This requires "feature" changes, i.e. new non-trivial code for previous patch releases. It also entails porting over any further format modification.

Instead of this, in combination with your second point, why not implement backwards write compatibility? The opt-in is then clearer to define (i.e. upgrades start with e.g. a "4.1-compatible" settings set that includes file format compatibility and disabling of new features, new nodes start with "current" settings set). When the upgrade completes and the user is happy with the result, the settings set can be replaced.

Doesn't this achieve what you want (and we all agree is a worthy goal) with much less effort for everyone? Supporting backwards-compatible writing is trivial, and we even have a proof-of-concept in the stats metadata serializer. It also simplifies by a serious margin the amount of work and thinking one has to do when a format improvement is implemented -- e.g. the TTL patch can just address this in exactly the way the problem was addressed in earlier versions of the format, by capping to 2038, without any need to specify, obey or test any configuration flags.

>> It’s a commitment, and it requires every contributor to consider it as part of work they produce.

> But it shouldn't be a burden. Ability to downgrade is a testable problem, so I see this work as a function of the suite of tests the project is willing to agree on supporting.

I fully agree with this sentiment, and I feel that the current "try to not introduce breaking changes" approach is adding the burden, but not the benefits -- because the latter cannot be proven, and are most likely already broken.

Regards,
Branimir

On Wed, Feb 22, 2023 at 1:01 AM Abe Ratnofsky <a...@aber.io> wrote:
Some interesting existing work on this subject is "Understanding and Detecting Software Upgrade Failures in Distributed Systems" - https://dl.acm.org/doi/10.1145/3477132.3483577, also summarized by Andrey Satarin here: https://asatarin.github.io/talks/2022-09-upgrade-failures-in-distributed-systems/

They specifically tested Cassandra upgrades, and have a solid list of defects that they found. They also describe their testing mechanism DUPTester, which includes a component that confirms that the leftover state from one version can start up on the next version. There is a wider scope of upgrade defects highlighted in the paper, beyond SSTable version support.

I believe the project would benefit from expanding our test suite similarly, by parametrizing more tests on upgrade version pairs.

Also, per Benedict's comment:

> It’s a commitment, and it requires every contributor to consider it as part of work they produce.

But it shouldn't be a burden. Ability to downgrade is a testable problem, so I see this work as a function of the suite of tests the project is willing to agree on supporting.

Specifically - I agree with Scott's proposal to emulate the HDFS upgrade-then-finalize approach. I would also support automatic finalization based on a time threshold or similar, to balance the priorities of safe and straightforward upgrades. Users need to be aware of the range of SSTable formats supported by a given version, and how to handle when their SSTables wouldn't be supported by an upcoming upgrade.

--
Abe


--

Reply via email to