I think we should remove it to a separate provider at the very least.
Ideally DaskExecutor should be maintained by the Dask team IMHO, so I would
be for deprecating it now and removing it in 3.0 (and offering the Dask
team to take it over).


On Tue, Mar 8, 2022 at 5:42 PM Elad Kalif <[email protected]> wrote:

> In the last 2 surveys we had a question of "What executor type do you use?"
> Dask was included in the Other choice and as expected few users use this.
> While we can not really rely on this survey I think it does give some
> information about usage.
>
> Do we really want to maintain core functionality for such a small number
> of users? What is the value in it?
> And also, can we remove it in a feature release? I'm not 100% sure on that.
>
> On Tue, Mar 8, 2022 at 6:09 PM Jarek Potiuk <[email protected]> wrote:
>
>> FYI Thanks to Kanthi, the Dask executor back (with all tests)
>> https://github.com/apache/airflow/pull/22027
>>
>> On Sat, Mar 5, 2022 at 10:03 PM Jarek Potiuk <[email protected]> wrote:
>>
>>> FYI. I asked the question at Dask's discourse
>>> https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433
>>>
>>> But I personally think we can make the "tactical" approach of ours on
>>> merging "disabling" Dask tests via
>>> https://github.com/apache/airflow/pull/22017 - it should not hold us
>>> back I think.
>>>
>>> On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <[email protected]> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> This is the second time [1] I am raising the question on the devlist
>>>> (last time the Dask team helped and I am going to reach out to them as
>>>> well).
>>>>
>>>> We have quite a problem with DaskExecutor in Airflow.
>>>>
>>>> Previously when I raised it, all tests in Dask Executor have been
>>>> marked as "skipped" and I asked whether to remove the Dask Executor
>>>> altogether. The Dask team responded and helped to enable the tests, however
>>>> since then there was no activity in this area. We have this code in our
>>>> "dask" extra - and it limits us. For example - we cannot merge the new
>>>> looker library from Google and (what's even more important) we cannot
>>>> update airflow to Python 3.10 and MacOS ARM (Due to cloudpickle limitation
>>>> that prevents us from upgrading apache-beam and numpy).
>>>>
>>>> Unfortunately Dask Executor - is part of the "core" of airflow, not a
>>>> provider. So we cannot really treat it as an "optional" provider..
>>>>
>>>> Because of that, we are using a very old cloudpickle version and Dasks'
>>>> distributed library.
>>>>
>>>>     # Dask support is limited, we need Dask team to upgrade support for
>>>> dask if we were to continue
>>>>     # Supporting it in the future
>>>>     # TODO: upgrade libraries used or maybe deprecate and drop DASK
>>>> support
>>>>     'cloudpickle>=1.4.1, <1.5.0',
>>>>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
>>>> `distributed`
>>>>     'distributed>=2.11.1, <2.20',
>>>>
>>>>
>>>> I tried to fix the tests, but there are many changes in the Dask
>>>> `distributed` library - including removal of parts of the test harness that
>>>> is used by some tests.
>>>>
>>>> My proposal (and I also created a PR
>>>> https://github.com/apache/airflow/pull/22017 for that):
>>>>
>>>> * remove the limitations from Dask libraries
>>>> * "skip" all the tests of Dask until they are fixed
>>>> * ask the Dask team to help with fixing those until we release 2.3.0 -
>>>> if they won't fix them we will drop support for dask executor (or at least
>>>> we will not run tests for it and mark it as "untested")
>>>> * in the latter case we might actually bring back the dependencies that
>>>> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
>>>> unit tests but if someone install "dask" extra it will work (but this will
>>>> also mean that some older providers will need to be installed - because
>>>> they will conflict with dask extra)
>>>>
>>>> Another possibility might be to simply remove Dask support altogether
>>>> or move it to a new provider.
>>>>
>>>> Let me know what you think. This one pretty much blocks the release of
>>>> new providers (we are almost ready to add Looker) but more importantly it
>>>> blocks the effort of supporting Python 3.10 and ARM M1.
>>>>
>>>> I hope we can quickly make a tactical decision to merge the PR and work
>>>> with the Dask team on the next steps and make the final decision later.
>>>>
>>>> J.
>>>>
>>>> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>>>>
>>>

Reply via email to