Max,
Do you have time to summarize this thread? Perhaps, publish it on the Wiki!
-s

On Thu, Dec 1, 2016 at 12:27 PM, Van Klaveren, Brian N. <
[email protected]> wrote:

> With the announcement of AWS Batch (https://aws.amazon.com/batch/), and
> my own selfish needs, I think it'd be really great to generally support
> Batch systems like AWS Batch, Slurm, and Torque as executors, potentially
> with an extension of the BashOperator, but I think it might actually be
> flexible enough to not need a dedicated BatchOperator.
>
> Brian
>
>
> On Nov 24, 2016, at 7:40 AM, Maycock, Luke <luke.maycock@affiliate.
> oliverwyman.com<mailto:[email protected]>> wrote:
>
> Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
>
>
> + 1
>
>
> Also, I believe xcoms would need to be addressed in the same way at the
> same time - I have added a comment to that affect on
> https://issues.apache.org/jira/browse/AIRFLOW-642
>
>
> I believe this would be implemented for all supported back-ends, not just
> PostgreSQL.
>
>
> Cheers,
> Luke Maycock
> OLIVER WYMAN
> [email protected]<mailto:luke.
> [email protected]><mailto:luke.maycock@
> affiliate.oliverwyman.com>
> www.oliverwyman.com<http://www.oliverwyman.com><http://
> www.oliverwyman.com/>
>
>
>
> ________________________________
> From: Arunprasad Venkatraman <[email protected]<mailto:[email protected]>>
> Sent: 21 November 2016 18:16
> To: [email protected]<mailto:dev@airflow.
> incubator.apache.org>
> Subject: Re: Airflow 2.0
>
> Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
> Ensure scheduler can be run continuously without needing restarts.
> Ensure scheduler can handle tens of thousands of active workflows
>
> +1
>
> We are planning to run around 40,000 tasks a day using airflow and some of
> them are critical to give quick feedback to developers. Currently having
> execution date to uniquely identify tasks does not work for us since we
> mainly trigger dags (instead of running them on schedule). And we collide
> with 1 sec granularity on several occasions.  Having a task uuid or
> associating dag_run to task_instance as suggested by Sergei table will help
> mitigate this issue for us and would make it easy for us to update task
> results too. We would be happy to start working on this if it makes sense.
>
> Also we are wondering if there were any work done in community to support
> multiple schedulers(or alternates to mysql/Postgres) because 1 scheduler
> does not scale for us well and we see slow down of up to couple of minute
> sometimes when there are several pending tasks.
>
> Thanks
>
>
>
> On Mon, Nov 21, 2016 at 9:57 AM, Chris Riccomini <[email protected]
> <mailto:[email protected]>>
> wrote:
>
> Ensure scheduler can be run continuously without needing restarts
>
> +1
>
> On Mon, Nov 21, 2016 at 5:25 AM, David Batista <[email protected]<mailto:
> [email protected]>> wrote:
> A small request, which might be handy.
>
> Having the possibility to select multiple tasks and mark them as
> Success/Clear/etc.
>
> Allow the UI to select individual tasks (i.e., inside the Tree View) and
> then have a button to mark them as Success/Clear/etc.
>
> On 21 November 2016 at 14:22, Sergei Iakhnin <[email protected]<mailto:
> [email protected]>> wrote:
>
> I've been running Airflow on 1500 cores in the context of scientific
> workflows for the past year and a half. Features that would be
> important to
> me for 2.0:
>
> - Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
> - Ensure scheduler can be run continuously without needing restarts.
> Right
> now it gets into some ill-determined bad state forcing me to restart it
> every 20 minutes.
> - Ensure scheduler can handle tens of thousands of active workflows.
> Right
> now this results in extremely long scheduling times and inconsistent
> scheduling even at 2 thousand active workflows.
> - Add more flexible task scheduling prioritization. The default
> prioritization is the opposite of the behaviour I want. I would prefer
> that
> downstream tasks always have higher priority than upstream tasks to
> cause
> entire workflows to tend to complete sooner, rather than scheduling
> tasks
> from other workflows. Having a few scheduling prioritization strategies
> would be beneficial here.
> - Provide better support for manually-triggered DAGs on the UI i.e. by
> showing them as queued.
> - Provide some resource management capabilities via something like slots
> that can be defined on workers and occupied by tasks. Using celery's
> concurrency parameter at the airflow server level is too coarse-grained
> as
> it forces all workers to be the same, and does not allow proper resource
> management when different workflow tasks have different resource
> requirements thus hurting utilization (a worker could run 8 parallel
> tasks
> with small memory footprint, but only 1 task with large memory footprint
> for instance).
>
> With best regards,
>
> Sergei.
>
>
> On Mon, Nov 21, 2016 at 2:00 PM Ryabchuk, Pavlo <
> [email protected]<mailto:[email protected]>>
> wrote:
>
> -1. We extremely rely on data profiling, as a pipeline health
> monitoring
> tool
>
> -----Original Message-----
> From: Chris Riccomini [mailto:[email protected]]
> Sent: Saturday, November 19, 2016 1:57 AM
> To: [email protected]<mailto:dev@airflow.
> incubator.apache.org>
> Subject: Re: Airflow 2.0
>
> RIP out the charting application and the data profiler
>
> Yes please! +1
>
> On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin <
> [email protected]<mailto:[email protected]>> wrote:
> Another point that may be controversial for Airflow 2.0: RIP out the
> charting application and the data profiler. Even though it's nice to
> have it there, it's just out of scope and has major security
> issues/implications.
>
> I'm not sure how popular it actually is. We may need to run a survey
> at some point around this kind of questions.
>
> Max
>
> On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin <
> [email protected]<mailto:[email protected]>> wrote:
>
> Using FAB's Model, we get pretty much all of that (REST API,
> auth/perms,
> CRUD) for free:
> https://emea01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Ffla
> sk-appbuilder.readthedocs.io<http://sk-appbuilder.readthedocs.io
> >%2Fen%2Flatest%2F&data=01%7C01%
> 7C%7C0064f
> 74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea6
> 4919%7C1&sd
> ata=uIJcFlm02IJ0Yo2cYLxAJZlkbCF2ZMk6dR%2FkhazZwVE%3D&reserved=0
> quickhowto.html?highlight=rest#exposed-methods
>
> I'm pretty intimate with FAB since I use it (and contributed to it)
> for Superset/Caravel.
>
> All that's needed is to derive FAB's model class instead of
> SqlAlchemy's model class (which FAB's model wraps and adds
> functionality to and is 100% compatible AFAICT).
>
> Max
>
> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini
> <[email protected]<mailto:[email protected]>>
> wrote:
>
> It may be doable to run this as a different package
> `airflow-webserver`, an
> alternate UI at first, and to eventually rip out the old UI off
> of
> the
> main
> package.
>
> This is the same strategy that I was thinking of for AIRFLOW-85.
> You
> can build the new UI in parallel, and then delete the old one
> later.
> I really think that a REST interface should be a pre-req to any
> large/new UI changes, though. Getting unified so that everything
> is
> driven through REST will be a big win.
>
> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin
> <[email protected]<mailto:[email protected]>> wrote:
> A multi-tenant UI with composable roles on top of granular
> permissions.
>
> Migrating from Flask-Admin to Flask App Builder would be an
> easy-ish win (since they're both Flask). FAB Provides a good
> authentication and permission model that ships out-of-the-box
> with
> a REST api. Suffice to define FAB models (derivative of
> SQLAlchemy's model) and you get a set
> of
> perms for the model (can_show, can_list, can_add, can_change,
> can_delete,
> ...) and a set of CRUD REST endpoints. It would also allow us to
> rip out the authentication backend code out of Airflow and rely
> on
> FAB for that.
> Also every single view gets permissions auto-created for it, and
> there
> are
> easy way to define row-level type filters based on user
> permissions.
>
> It may be doable to run this as a different package
> `airflow-webserver`, an
> alternate UI at first, and to eventually rip out the old UI off
> of
> the
> main
> package.
>
> https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2
> Fflask-appbuilder.readthedocs.io<http://Fflask-appbuilder.readthedocs.io
> >%2Fen%2Flatest%2F&data=01%
> 7C01%7C%
> 7C0064f74fd0d940ab732808d4100e9c3f%
> 7C6d4034cd72254f72b85391feaea64
> 919%7C1&sdata=8mUPRcf4%2FQUDSbju%2BjLLImalhZeU7tOA%
> 2BFpeO%2BjcEs8%
> 3D&reserved=0
>
> I'd love to carve some time and lead this.
>
> On Fri, Nov 18, 2016 at 1:32 PM, Chris Riccomini
> <[email protected]<mailto:[email protected]>
>
> wrote:
>
> Full-fledged REST API (that the UI also uses) would be great in
> 2.0.
>
> On Fri, Nov 18, 2016 at 6:26 AM, David Kegley <[email protected]<mailto:
> [email protected]>>
> wrote:
> Hi All,
>
> We have been using Airflow heavily for the last couple months
> and
> it’s
> been great so far. Here are a few things we’d like to see
> prioritized
> in
> 2.0.
>
> 1) Role based access to DAGs:
> We would like to see better role based access through the UI.
> There’s a
> related ticket out there but it hasn’t seen any action in a few
> months
> https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2
> F%2Fissues.apache.org<http://2Fissues.apache.org>%2Fjira%
> 2Fbrowse%2FAIRFLOW-85&data=01%
> 7C01
> %7C%7C0064f74fd0d940ab732808d4100e
> 9c3f%7C6d4034cd72254f72b85391
> feaea64919%7C1&sdata=VsgwHZxr0%2FDQN1jeBTJsfyIGu%
> 2FZkkWhzAvxNvB
> N531k%3D&reserved=0
>
> We use a templating system to create/deploy DAGs dynamically
> based on
> some directory/file structure. This allows analysts to quickly
> deploy
> and
> schedule their ETL code without having to interact with the
> Airflow installation directly. It would be great if those same
> analysts could access to their own DAGs in the UI so that they
> can clear DAG runs,
> mark
> success, etc. while keeping them away from our core ETL and
> other
> people's/organization's DAGs. Some of this can be accomplished
> with
> ‘filter
> by owner’ but it doesn’t address the use case where a DAG can
> be
> maintained
> by multiple users in the same organization when they have
> separate
> Airflow
> user accounts.
>
> 2) An option to turn off backfill:
> https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2
> F%2Fissues.apache.org<http://2Fissues.apache.org>%2Fjira%
> 2Fbrowse%2FAIRFLOW-558&data=
> 01%7C0
> 1%7C%7C0064f74fd0d940ab732808d4100e
> 9c3f%7C6d4034cd72254f72b8539
> 1feaea64919%7C1&sdata=Xkz7dTkFMEa4np19m4ML1VajVqVPNy
> %2BVSS5Y%2B
> Sm8Odk%3D&reserved=0 For cases where a DAG does an insert
> overwrite on a table every day.
> This might be a realistic option for the current version but I
> just
> wanted
> to call attention to this feature request.
>
> Best,
> David
>
> On Nov 17, 2016, at 6:19 PM, Maxime Beauchemin <
> [email protected]<mailto:[email protected]><mailto:
> [email protected]>>
> wrote:
>
> *This is a brainstorm email thread about Airflow 2.0!*
>
> I wanted to share some ideas around what I would like to do
> in
> Airflow
> 2.0
> and would love to hear what others are thinking. I'll compile
> the
> ideas
> that are shared in this thread in a Wiki once the
> conversation
> fades.
>
> -------------------------------------------
>
> First idea, to get the conversation started:
>
> *Breaking down the package*
> `pip install airflow-common airflow-scheduler
> airflow-webserver
> airflow-operators-googlecloud ...`
>
> It seems to me like we're getting to a point where having
> different repositories and different packages would make
> things
> much easier in
> all
> sorts of ways. For instance the web server is a lot less
> sensitive
> than
> the
> scheduler, and changes to operators should/could be deployed
> at
> will, independently from the main package. People in their
> environment
> could
> upgrade only certain packages when needed. Travis builds
> would
> be
> more
> targeted, and take less time, ...
>
> Also, the whole current "extra_requires" approach to optional
> dependencies
> (in setup.py) is kind getting out-of-hand.
>
> Of course `pip install airflow` would bring in a collection
> of
> sub-packages
> similar in functionality to what it does now, perhaps without
> so many operators you probably don't need in your
> environment.
>
> The release process is the main pain-point and the biggest
> risk
> for
> the
> project, and I feel like this a solid solution to address it.
>
> Max
>
>
>
>
>
>
> --
>
> Sergei
>
>
>
>
> --
> *David Batista* *Data Engineer**, HelloFresh Global*
> Saarbrücker Str. 37a | 10405 Berlin
> [email protected]<mailto:[email protected]> <[email protected]
> <mailto:[email protected]>>
>
> --
>
> [image: logo]
>  <http://www.facebook.com/hellofreshde>   <http://twitter.com/
> HelloFreshde>
>   <http://instagram.com/hellofreshde/>   <http://blog.hellofresh.de/>
> <https://app.adjust.com/ayje08?campaign=Hellofresh&;
> deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F%
> 2Fwww.hellofresh.com<http://2Fwww.hellofresh.com>%2Fapp%
> 2F%3Futm_medium%3Demail%26utm_
> source%3Demail_signature&fallback=https%3A%2F%2Fwww.
> hellofresh.com<http://hellofresh.com>%2Fapp%2F%3Futm_medium%3Demail%26utm_
> source%
> 3Demail_signature>
>
> *HelloFresh App –Download Now!*
> <https://app.adjust.com/ayje08?campaign=Hellofresh&;
> deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F%
> 2Fwww.hellofresh.com<http://2Fwww.hellofresh.com>%2Fapp%
> 2F%3Futm_medium%3Demail%26utm_
> source%3Demail_signature&fallback=https%3A%2F%2Fwww.
> hellofresh.com<http://hellofresh.com>%2Fapp%2F%3Futm_medium%3Demail%26utm_
> source%
> 3Demail_signature>
> *We're active in:*
> US <https://www.hellofresh.com/?utm_medium=email&utm_source=
> email_signature>
> |  DE
> <https://www.hellofresh.de/?utm_medium=email&utm_source=email_signature>
> |
> UK
> <https://www.hellofresh.co.uk/?utm_medium=email&utm_source=
> email_signature>
> |  NL
> <https://www.hellofresh.nl/?utm_medium=email&utm_source=email_signature>
> |
> AU
> <https://www.hellofresh.com.au/?utm_medium=email&utm_
> source=email_signature>
> |  BE
> <https://www.hellofresh.be/?utm_medium=email&utm_source=email_signature>
> |
> AT <https://www.hellofresh.at/?utm_medium=email&utm_source=
> email_signature>
> |  CH
> <https://www.hellofresh.ch/?utm_medium=email&utm_source=email_signature>
> |
> CA <https://www.hellofresh.ca/?utm_medium=email&utm_source=
> email_signature>
>
> www.HelloFreshGroup.com<http://www.HelloFreshGroup.com>
> <http://www.hellofreshgroup.com/?utm_medium=email&utm_
> source=email_signature>
>
> We are hiring around the world – Click here to join us
> <https://www.hellofresh.com/jobs/?utm_medium=email&utm_
> source=email_signature>
>
> --
>
> <https://www.hellofresh.com/jobs/?utm_medium=email&utm_
> source=email_signature>
> HelloFresh AG, Berlin (Sitz der Gesellschaft) | Vorstände: Dominik S.
> Richter (Vorsitzender), Thomas W. Griesel, Christian Gärtner |
> Vorsitzender
> des Aufsichtsrats: Jeffrey Lieberman | Eingetragen beim Amtsgericht
> Charlottenburg, HRB 171666 B | USt-Id Nr.: DE 302210417
>
> *CONFIDENTIALITY NOTICE:* This message (including any attachments) is
> confidential and may be privileged. It may be read, copied and used only
> by
> the intended recipient. If you have received it in error please contact
> the
> sender (by return e-mail) immediately and delete this message. Any
> unauthorized use or dissemination of this message in whole or in parts is
> strictly prohibited.
>
>
> ________________________________
> This e-mail and any attachments may be confidential or legally privileged.
> If you received this message in error or are not the intended recipient,
> you should destroy the e-mail message and any attachments or copies, and
> you are prohibited from retaining, distributing, disclosing or using any
> information contained herein. Please inform us of the erroneous delivery by
> return e-mail. Thank you for your cooperation.
>
>

Reply via email to