We are actually planning (pending confirmation) "First time Apache Airflow
Contributor's" training at PyCon US in April. I think if there is a good
usage of Dask and we got Scientific -oriented users using Airflow with Dask
- I am all for having a closer cooperation on that topic :).

J.


On Mon, Jan 20, 2020 at 5:25 PM Darren Weber <[email protected]>
wrote:

> Thanks for the ping on https://github.com/dask/dask/issues/5803
>
> I'm curious about how dask async features might be low-hanging fruit for
> Airflow scaling
> - https://distributed.dask.org/en/latest/asynchronous.html
> - https://github.com/apache/airflow/pull/6984
>
> Our company has scientific workflows and it uses dask, usually on large EC2
> instances or batch jobs.  I've been getting familiar with dask from a user
> perspective; I don't yet know the internals from a dev-perspective.  I
> mostly use dask.delayed to scale threads/processes on a local host, with a
> simple concurrent.futures API.  Dask.distributed can also run a cluster
> with client connections (I previously worked with spark a bit and dask has
> some good documentation on the comparisons between spark and dask).  There
> are also some options for auto-scaling a dask cluster using k8s -
> https://docs.dask.org/en/latest/setup/adaptive.html - so you get an
> auto-scaling cluster with a lot of features for scientific computing with
> the scipy-compatible stack.
>
> I can't promise to complete anything in a timely manner, despite any
> proposals to remove dask executors entirely.  I may be in-n-out of these
> discussions from time-to-time, possibly silent for several weeks at a time
> while I'm heads down on my full-time position.  So if Airflow 2.0 removes
> them for whatever reason, I would hope it could be possible to add them
> back in Airflow 2.1 if the work can be done to get it working and the
> design patterns make sense and/or there is a larger user community than
> anyone is yet aware of.  At present, I don't hear a clear specification for
> having it work or an argument that it doesn't work at all, but I hear and
> see that unit tests are disabled.  It might be possible to identify in dask
> itself how to setup the test environment.  It might help to better
> understand the niche that dask serves well.
>
> The online forums and github may suffice, but if it would be possible to
> find funding to sponsor a joint hack-a-thon at PyCon or something, that
> would be great.  As a new contributor to Airflow, I'm still learning the
> ropes and it would be good to attend an Airflow contributor workshop (maybe
> someone could spin one up in the bay-area?).
>
> Best,
> Darren
>
>
>
>
> On Sun, Jan 19, 2020 at 9:28 AM Jarek Potiuk <[email protected]>
> wrote:
>
> > Seems like there is an interest https://github.com/dask/dask/issues/5803
> > :).
> > Let's see where it gets us.
> >
> > J.
> >
> > On Sat, Jan 18, 2020 at 9:46 PM Jarek Potiuk <[email protected]>
> > wrote:
> >
> > > Following discussion Dask's gitter, I created an issue in Dask's
> github :
> > > https://github.com/dask/dask/issues/5803
> > >
> > > Let's see if we can get someone from Dask community interested.
> > >
> > > On Fri, Jan 17, 2020 at 10:00 PM Jarek Potiuk <
> [email protected]>
> > > wrote:
> > >
> > >> Good idea :) doing that,
> > >>
> > >> On Fri, Jan 17, 2020 at 9:58 PM Daniel Imberman <
> > >> [email protected]> wrote:
> > >>
> > >>> Maybe we can reach out to a company that does Dask as a service?
> > >>>
> > >>> via Newton Mail [
> > >>>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2
> > >>> ]
> > >>> On Fri, Jan 17, 2020 at 9:31 AM, Jarek Potiuk <
> > [email protected]>
> > >>> wrote:
> > >>> Yeah. I think if we do not find anyone willing to champion it (no
> > matter
> > >>> committer or contributor), I would be for dropping it.
> > >>>
> > >>> J.
> > >>>
> > >>> On Fri, Jan 17, 2020 at 6:07 PM Daniel Imberman <
> > >>> [email protected]>
> > >>> wrote:
> > >>>
> > >>> > I think we need to ask “who is going to champion this executor.” I
> > see
> > >>> > that it is being used (a bit), but am concerned if no one with
> > >>> knowledge of
> > >>> > this executor is willing to maintain it.
> > >>> >
> > >>> > I’ve personally never used Dask and the DaskExecutor isn’t super
> high
> > >>> on
> > >>> > my priority list compared to things like autoscaling, DAG
> > >>> serialization,
> > >>> > etc.
> > >>> >
> > >>> > via Newton Mail [
> > >>> >
> > >>>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2
> > >>> > ]
> > >>> > On Fri, Jan 17, 2020 at 6:07 AM, Jarek Potiuk <
> > >>> [email protected]>
> > >>> > wrote:
> > >>> > Do we have anyone here who uses Dask Executor and would like to
> test
> > >>> it/fix
> > >>> > the tests. They are marked now as xfailed (expected to fail) and it
> > >>> would
> > >>> > be great to fix them.
> > >>> >
> > >>> > J.
> > >>> >
> > >>> >
> > >>> > On Tue, Jan 14, 2020 at 12:18 AM Darren Weber <
> > >>> [email protected]
> > >>> > >
> > >>> > wrote:
> > >>> >
> > >>> > > +1 for keeping it and fixing tests
> > >>> > >
> > >>> > > PS, I also noticed the skipped tests while looking at an option
> to
> > >>> use
> > >>> > the
> > >>> > > async client feature; if/when I get time to get back on that and
> > >>> figure
> > >>> > out
> > >>> > > how the test setup needs to work, I might also discover how to
> > enable
> > >>> > tests
> > >>> > > for the non-async executor. No promises, just noting that I'm
> aware
> > >>> of it
> > >>> > > too.
> > >>> > >
> > >>> > > On Mon, Jan 13, 2020 at 8:06 AM Jarek Potiuk <
> > >>> [email protected]>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > For now I marked the skipped tests we had (including Dask) as
> > >>> > > > pytest.mark.xfail (means - expected to fail). They will be
> > >>> executed and
> > >>> > > > summarized as XFail tests and we will have to deal with them at
> > >>> some
> > >>> > > point.
> > >>> > > >
> > >>> > > > I think we will have to decide if we want to keep it or not,
> and
> > >>> either
> > >>> > > > remove both tests and executor or fix the tests.
> > >>> > > >
> > >>> > > > J.
> > >>> > > >
> > >>> > > > On Mon, Jan 13, 2020 at 4:40 PM Shaw, Damian P. <
> > >>> > > > [email protected]> wrote:
> > >>> > > >
> > >>> > > > > FYI I used Dash instead of Local Executor when first starting
> > >>> > Airflow,
> > >>> > > it
> > >>> > > > > was a great way to make sure the Executor and Scheduler
> weren’t
> > >>> tied
> > >>> > to
> > >>> > > > > each other with no difficulty in set-up. But once I actually
> > >>> started
> > >>> > > > > deploying to multiple boxes I needed queue names pretty
> > quickly.
> > >>> So
> > >>> > not
> > >>> > > > > going to say it's needed but for me it was a helpful stepping
> > >>> stone.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > -----Original Message-----
> > >>> > > > > From: Ash Berlin-Taylor <[email protected]>
> > >>> > > > > Sent: Sunday, January 12, 2020 17:38
> > >>> > > > > To: [email protected]
> > >>> > > > > Cc: [email protected]
> > >>> > > > > Subject: Re: Remove Dask Executor in Airflow 2.0 ?
> > >>> > > > >
> > >>> > > > > It hasn't been discussed before, but unlike the Mesos one
> this
> > >>> one
> > >>> > was
> > >>> > > > > seen a (tiny) bit of activity in 1.10 so at least one person
> is
> > >>> using
> > >>> > > it
> > >>> > > > > https://github.com/apache/airflow/pull/5273
> > >>> > > > >
> > >>> > > > > On Jan 12 2020, at 9:05 pm, Jarek Potiuk <
> > >>> [email protected]>
> > >>> > > > wrote:
> > >>> > > > > > I am finishing the PR on separating integrations and
> > improving
> > >>> our
> > >>> > CI
> > >>> > > > > > footprint (https://github.com/apache/airflow/pull/7091)
> but
> > >>> during
> > >>> > > > > > this change I have found that we have - apparently -
> > >>> dysfunctional
> > >>> > > > > > DaskExecutor in Airflow 2.0.
> > >>> > > > > >
> > >>> > > > > > There is a "test_dask_executor.py" for which all tests are
> > >>> skipped.
> > >>> > > > > > And they fail when I try to run the tests. I tried to look
> > for
> > >>> any
> > >>> > > > > > reference in devlist archives but I couldn't find anything
> > >>> about
> > >>> > it.
> > >>> > > > > >
> > >>> > > > > > Can someone shed some light on this? Should we remove Dask
> > >>> executor
> > >>> > > > > > completely from Airflow 2.0 ? Or should we fix the
> > >>> tests/executor ?
> > >>> > > > > > Has it been discussed ?
> > >>> > > > > >
> > >>> > > > > > J.
> > >>> > > > > >
> > >>> > > > > > --
> > >>> > > > > > Jarek Potiuk
> > >>> > > > > > Polidea <https://www.polidea.com/> | Principal Software
> > >>> Engineer
> > >>> > > > > >
> > >>> > > > > > M: +48 660 796 129 <+48660796129>
> > >>> > > > > > [image: Polidea] <https://www.polidea.com/>
> > >>> > > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> ===============================================================================
> > >>> > > > >
> > >>> > > > > Please access the attached hyperlink for an important
> > electronic
> > >>> > > > > communications disclaimer:
> > >>> > > > >
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> ===============================================================================
> > >>> > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > > >
> > >>> > > > --
> > >>> > > >
> > >>> > > > Jarek Potiuk
> > >>> > > > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > >>> > > >
> > >>> > > > M: +48 660 796 129 <+48660796129>
> > >>> > > > [image: Polidea] <https://www.polidea.com/>
> > >>> > > >
> > >>> > >
> > >>> > >
> > >>> > > --
> > >>> > > Darren L. Weber, Ph.D.
> > >>> > > http://psdlw.users.sourceforge.net/
> > >>> > > http://psdlw.users.sourceforge.net/wordpress/
> > >>> > >
> > >>> >
> > >>> >
> > >>> > --
> > >>> >
> > >>> > Jarek Potiuk
> > >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>> >
> > >>> > M: +48 660 796 129 <+48660796129>
> > >>> > [image: Polidea] <https://www.polidea.com/>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Jarek Potiuk
> > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>
> > >>> M: +48 660 796 129 <+48660796129>
> > >>> [image: Polidea] <https://www.polidea.com/>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Jarek Potiuk
> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>
> > >> M: +48 660 796 129 <+48660796129>
> > >> [image: Polidea] <https://www.polidea.com/>
> > >>
> > >>
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> > >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>
>
> --
> Darren L. Weber, Ph.D.
> http://psdlw.users.sourceforge.net/
> http://psdlw.users.sourceforge.net/wordpress/
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to