Hello everyone,
This is the second time [1] I am raising the question on the devlist (last
time the Dask team helped and I am going to reach out to them as well).
We have quite a problem with DaskExecutor in Airflow.
Previously when I raised it, all tests in Dask Executor have been marked as
"skipped" and I asked whether to remove the Dask Executor altogether. The
Dask team responded and helped to enable the tests, however since then
there was no activity in this area. We have this code in our "dask" extra -
and it limits us. For example - we cannot merge the new looker library from
Google and (what's even more important) we cannot update airflow to Python
3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
upgrading apache-beam and numpy).
Unfortunately Dask Executor - is part of the "core" of airflow, not a
provider. So we cannot really treat it as an "optional" provider..
Because of that, we are using a very old cloudpickle version and Dasks'
distributed library.
# Dask support is limited, we need Dask team to upgrade support for
dask if we were to continue
# Supporting it in the future
# TODO: upgrade libraries used or maybe deprecate and drop DASK support
'cloudpickle>=1.4.1, <1.5.0',
'dask>=2.9.0, <2021.6.1', # dask 2021.6.1 does not work with
`distributed`
'distributed>=2.11.1, <2.20',
I tried to fix the tests, but there are many changes in the Dask
`distributed` library - including removal of parts of the test harness that
is used by some tests.
My proposal (and I also created a PR
https://github.com/apache/airflow/pull/22017 for that):
* remove the limitations from Dask libraries
* "skip" all the tests of Dask until they are fixed
* ask the Dask team to help with fixing those until we release 2.3.0 - if
they won't fix them we will drop support for dask executor (or at least we
will not run tests for it and mark it as "untested")
* in the latter case we might actually bring back the dependencies that
"worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
unit tests but if someone install "dask" extra it will work (but this will
also mean that some older providers will need to be installed - because
they will conflict with dask extra)
Another possibility might be to simply remove Dask support altogether or
move it to a new provider.
Let me know what you think. This one pretty much blocks the release of new
providers (we are almost ready to add Looker) but more importantly it
blocks the effort of supporting Python 3.10 and ARM M1.
I hope we can quickly make a tactical decision to merge the PR and work
with the Dask team on the next steps and make the final decision later.
J.
[1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh