[DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context)

Jarek Potiuk Tue, 22 Nov 2022 01:32:17 -0800

Hello everyone,

We had a few discussions in PRs recently about removing some
functionality from the Airflow Core and the question of backwards
compatibility came up. Example two discussions:


* https://github.com/apache/airflow/pull/27826
* https://github.com/apache/airflow/pull/27067

I think we should collectively decide if we are ok to remove some
features from Airflow Core which are likely not heavily used and which
we assess as not risky, and we are willing to take the risk.

# Current status with SemVer:

We have SemVer which means that we cannot release Airflow 2.* with any
breaking change. Breaking changes should go to 3. Regarding what
"breaking change is", my definition is:

* any removal of any functionality (except experimental) is
automatically breaking

* any change of public "Interface" is breaking (though we have no
formal definition of what "public interface" is - there were a few
things that were implicitly seen as "public interface" (Variables,
Connections for example) - but it is a bit  blurry what "public
interface" means

However this is just what **I** think was the approach we used so far
at least. Others might have different ideas and understanding of what
"breaking" is, and I think this is the main problem we have: we do not
have a common understanding nor definition of what "breaking" is. We
simply do not have it. One might say it is a "common sense", and I was
also thinking like that - that was obvious, but after being involved
in a number of discussions I started to change my thinking about it.

# My current view

My point (and something that I actually learned about recently) is
that there is no "objective" definition of "breaking". Quite recently
Hyrum's law https://www.hyrumslaw.com/ started to circulate the IT
world, I've heard of it several times and the more I think about it,
the more I think the more I agree with it. The law summary is:

"With a sufficient number of users of an API, it does not matter what
you promise in the contract: all observable behaviors of your system
will be depended on by somebody."

This basically means that no matter how hard you try - any release you
make will be somewhat breaking.

My current interpretation/understanding is that really "breaking" is
not 0/1 - it is continuum and the definition of "breaking" is for me
"how likely it is that the change will break many people's workflows
in the way it will be difficult for them to recover". Yes, there are
some "obvious" cases and there I think we know what is breaking, but I
am talking about some cases that are less obvious (like the two cases
above).

I believe we introduced SemVer in Airflow for one reason - to make our
users more confident that they can migrate easily without introducing
breaking changes. And I think this is still a good idea and good
cause. And we should continue doing that. However what we can change
is what we see as our "breaking" definition and that we start taking
risks. Risks that we will break someone's workflow when we introduce a
change. I think we could take the risk in some cases (the two cases
above) where we agree that the risk is low and that we provide an easy
way to recover for those few users who will hit it.

# My Proposal

My proposal is that whenever we seem to agree that some feature is
very rarely used and that it has very low impact and that there is an
easy way to recover, we classify it as "non-breaking change" and
remove it in the upcoming feature release. This will need to be
accompanied with documentation on how to recover, a warning message
when it happens and a deliberate statement by the community that we
should remove it. It can be a [LAZY CONSENSUS] thread.

Generally something that community will be able to react to and oppose
(in devlist - not only in a PR) - also this way we will keep the
record of it in our archives, so that everyone can find out this was a
deliberate decision and what were the reasons for assessing that this
change is not risky to happen.

If we agree to that, I am super happy to document it in our policies
in README as a way how we deal with such cases.

Let me know what you think? Maybe others might have other proposals
what should be our policy for such non-obvious cases.

J.

[DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context)

Reply via email to