Just to illustrate it better. Here is an extremely good example from today:

Our main build started to fail today (
https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true)
because Pandas released a new 1.4.0 version last  night:
https://pypi.org/project/pandas/1.4.0/
The root cause of the failure was that Pandas 1.4.0 requires SqlAlchemy
1.4.0 or above. But surprisingly -  there is no hard limit in the
released Pandas. What's more - Pandas does not even mention that
it **might** need sqlalchemy at all (and in version >=1.4.0 !).

What happens is that in runtime, during running our tests we get this error:
`Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version '1.3.24'
currently installed).`

We already had a limitation (not documented why) on Pandas

pandas_requirement = 'pandas>=0.17.1, <2.0'

However this `<2.0` was completely useless in this case, because it's 1.4
version that broke parts of Airflow.
We assumed that the <2.0 limitation will protect us (apparently) but it did
not. However, the failure in tests protected our constraints from being
upgraded and in our "main" pandas is still 1.3.4. Our users who follow "use
constraints" and those who use Airflow images, are fully protected against
those kinds of problems.

This particular problem will likely be solved in a few days when Flask
Application Builder 3.4.4rc1 (
https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released (It
moves the upper bound limit for sqlalchemy to <1.5 and that was the only
thing that blocked us from getting sqlalchemy 1.4).

My point is that simply we do not know if any future version of any
dependency will break Airflow. And any reasonable assumptions about that
guessing from "Major" version is IMHO just wild guessing. And also by
adding such limits - we are limiting our users to update to higher version
of dependencies when it is harmless (very similarly as Flask Application
Builder blocked us in this case from migration to sqlalchemy 1.4).

The temporary fix (until FAB 3.4.4 is released) is here:

https://github.com/apache/airflow/pull/21045

I believe after FAB 3.4.4 is released we remove the fix we should just get
rid of the pandas limitation. Constraints of ours protect our users when
they install Airflow "according to our instructions with constraints" -
which is the only official way of installing Airflow. We are protecting our
users from those kind of events, but at the same time we should relax
pretty much all the upper bounds to not block our users in the future in
case they need to move to higher versions of dependencies released in the
future that we have no idea if they will break, or not Airflow..

Let me know your thoughts - I think this Pandas case is great to illustrate
my point.

J.




On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <[email protected]> wrote:

> Hello everyone,
>
> TL;DR; I think there is one thing about dependencies that we should have
> some agreement on to make some common approach. Namely about using upper
> bounds on our dependencies.
> Should we set some rules on when we set upper-bounds on the deps ? What
> rules should they be?
>
> Currently we use constraints to make sure Airflow in specific version can
> be installed using "golden" set of dependencies. Those are part of our CI,
> automatically updated and tests which makes it really nice as they are
> "fixed" for installation of particular version but they continuously
> upgrade (even across major versions) when they pass tests in CI.
>
> This all happens pretty much automatically by our CI, except the cases
> where our dependencies are upper-limited. We occasionally do some
> "setup.py", "setup.cfg" changes manually to bump some upper limits, but
> this is more the result of some external request or "occasional" cleanup to
> do it - rather than a regular process.
>
> Now, we have some dependencies that are upper-bound for good reasons and
> they are documented. We also have a number of dependencies that are not
> upper-limited. But I think this pretty inconsistent. Small excerpt from
> setup.cfg:
>
>     # Logging is broken with itsdangerous > 2
>     itsdangerous>=1.1.0, <2.0
>     # Jinja2 3.1 will remove the 'autoescape' and 'with' extensions, which
> would
>     # break Flask 1.x, so we limit this for future compatibility. Remove
> this
>     # when bumping Flask to >=2.
>     jinja2>=2.10.1,<3.1
>     ...
>     # python daemon crashes with 'socket operation on non-socket' for
> python 3.8+ in version < 2.2.4
>     # https://pagure.io/python-daemon/issue/34
>     python-daemon>=2.2.4
>     python-dateutil>=2.3, <3
>     python-nvd3~=0.15.0
>
> * itsdangerous is upper limited and the reason is specified in the comment
> (though we do not know when we could remove the limit)
>
> * Jinja is upper-limited and we know not only why but also when it can be
> removed
> * Python-daemon is not upper-limited but it has a comment why it is
> "lower-limited" (which is pretty useless IMHO)
> * python-dateutil is upper-limited but we do not know why
> * python-nvd3 is also upper limited (~0.15.0 - limits it to any 0.15.x
> version but 0.16 could not be installed)
>
> I think there are many inconsistencies and the way we treat dependency
> limits is pretty inconsistent - we have no rules agreed.
>
> I would love to discuss how we can standardize it or at least set some
> rules we can follow.
>
> My initial thoughts are:
>
> * we can (but do not have to) have lower bounds if we need to protect and
> we know about some limitations, but we do not need to document those. Just
> set the limit. The nice thing about lower bounds that they do not "decay"
> over time. By default latest "eligible" version is installed by PIP (but of
> course if other deps have conflicting upper bounds, keeping lower limits
> when unnecessary is a problem.
> * by default we should not have upper-bounds. We know it limits our users
> and constraints + CI tests are nicely handling the scenario when things are
> breaking.
> * we should only introduce upper bounds if we know that there is breaking
> change (or that it is very likely and "foreseen" - betas/rc2/discussions
> about upcoming breaking changes in the dependencies
> * when we introduce upper-bounds we should always specify why and what is
> the condition to remove them
>
> WDYT?
>
> J.
>
>
>
>

Reply via email to