FYI Thanks to Kanthi, the Dask executor back (with all tests)
https://github.com/apache/airflow/pull/22027

On Sat, Mar 5, 2022 at 10:03 PM Jarek Potiuk <[email protected]> wrote:

> FYI. I asked the question at Dask's discourse
> https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433
>
> But I personally think we can make the "tactical" approach of ours on
> merging "disabling" Dask tests via
> https://github.com/apache/airflow/pull/22017 - it should not hold us
> back I think.
>
> On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <[email protected]> wrote:
>
>> Hello everyone,
>>
>> This is the second time [1] I am raising the question on the devlist
>> (last time the Dask team helped and I am going to reach out to them as
>> well).
>>
>> We have quite a problem with DaskExecutor in Airflow.
>>
>> Previously when I raised it, all tests in Dask Executor have been marked
>> as "skipped" and I asked whether to remove the Dask Executor altogether.
>> The Dask team responded and helped to enable the tests, however since then
>> there was no activity in this area. We have this code in our "dask" extra -
>> and it limits us. For example - we cannot merge the new looker library from
>> Google and (what's even more important) we cannot update airflow to Python
>> 3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
>> upgrading apache-beam and numpy).
>>
>> Unfortunately Dask Executor - is part of the "core" of airflow, not a
>> provider. So we cannot really treat it as an "optional" provider..
>>
>> Because of that, we are using a very old cloudpickle version and Dasks'
>> distributed library.
>>
>>     # Dask support is limited, we need Dask team to upgrade support for
>> dask if we were to continue
>>     # Supporting it in the future
>>     # TODO: upgrade libraries used or maybe deprecate and drop DASK
>> support
>>     'cloudpickle>=1.4.1, <1.5.0',
>>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
>> `distributed`
>>     'distributed>=2.11.1, <2.20',
>>
>>
>> I tried to fix the tests, but there are many changes in the Dask
>> `distributed` library - including removal of parts of the test harness that
>> is used by some tests.
>>
>> My proposal (and I also created a PR
>> https://github.com/apache/airflow/pull/22017 for that):
>>
>> * remove the limitations from Dask libraries
>> * "skip" all the tests of Dask until they are fixed
>> * ask the Dask team to help with fixing those until we release 2.3.0 - if
>> they won't fix them we will drop support for dask executor (or at least we
>> will not run tests for it and mark it as "untested")
>> * in the latter case we might actually bring back the dependencies that
>> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
>> unit tests but if someone install "dask" extra it will work (but this will
>> also mean that some older providers will need to be installed - because
>> they will conflict with dask extra)
>>
>> Another possibility might be to simply remove Dask support altogether or
>> move it to a new provider.
>>
>> Let me know what you think. This one pretty much blocks the release of
>> new providers (we are almost ready to add Looker) but more importantly it
>> blocks the effort of supporting Python 3.10 and ARM M1.
>>
>> I hope we can quickly make a tactical decision to merge the PR and work
>> with the Dask team on the next steps and make the final decision later.
>>
>> J.
>>
>> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>>
>

Reply via email to