Hi folks, Here is the list <https://cwiki.apache.org/confluence/display/AIRFLOW/2017+Roadmap+Items> of possible roadmap items for 2017. I think that clubbing deliverables into 1.9 or 2.0 is orthogonal to our high level 2017 planning so I went with this approach.
Please take a look at the wiki and see if there is something missing or needs further clarification by the end of the week and I will send out a survey next week to get a sense of priorities across the community. Let me know if you have any questions. Cheers, Gurer On Tue, Dec 6, 2016 at 11:15 PM, Maxime Beauchemin < [email protected]> wrote: > I spoke with Gurer yesterday and he's going to summarize and send a survey. > It should be out this week. > > Max > > On Tue, Dec 6, 2016 at 7:24 PM, siddharth anand <[email protected]> wrote: > > > Max, > > Do you have time to summarize this thread? Perhaps, publish it on the > Wiki! > > -s > > > > On Thu, Dec 1, 2016 at 12:27 PM, Van Klaveren, Brian N. < > > [email protected]> wrote: > > > > > With the announcement of AWS Batch (https://aws.amazon.com/batch/), > and > > > my own selfish needs, I think it'd be really great to generally support > > > Batch systems like AWS Batch, Slurm, and Torque as executors, > potentially > > > with an extension of the BashOperator, but I think it might actually be > > > flexible enough to not need a dedicated BatchOperator. > > > > > > Brian > > > > > > > > > On Nov 24, 2016, at 7:40 AM, Maycock, Luke <luke.maycock@affiliate. > > > oliverwyman.com<mailto:[email protected]>> wrote: > > > > > > Add FK to dag_run to the task_instance table on Postgres so that > > > task_instances can be uniquely attributed to dag runs. > > > > > > > > > + 1 > > > > > > > > > Also, I believe xcoms would need to be addressed in the same way at the > > > same time - I have added a comment to that affect on > > > https://issues.apache.org/jira/browse/AIRFLOW-642 > > > > > > > > > I believe this would be implemented for all supported back-ends, not > just > > > PostgreSQL. > > > > > > > > > Cheers, > > > Luke Maycock > > > OLIVER WYMAN > > > [email protected]<mailto:luke. > > > [email protected]><mailto:luke.maycock@ > > > affiliate.oliverwyman.com> > > > www.oliverwyman.com<http://www.oliverwyman.com><http:// > > > www.oliverwyman.com/> > > > > > > > > > > > > ________________________________ > > > From: Arunprasad Venkatraman <[email protected]<mailto:[email protected]>> > > > Sent: 21 November 2016 18:16 > > > To: [email protected]<mailto:dev@airflow. > > > incubator.apache.org> > > > Subject: Re: Airflow 2.0 > > > > > > Add FK to dag_run to the task_instance table on Postgres so that > > > task_instances can be uniquely attributed to dag runs. > > > Ensure scheduler can be run continuously without needing restarts. > > > Ensure scheduler can handle tens of thousands of active workflows > > > > > > +1 > > > > > > We are planning to run around 40,000 tasks a day using airflow and some > > of > > > them are critical to give quick feedback to developers. Currently > having > > > execution date to uniquely identify tasks does not work for us since we > > > mainly trigger dags (instead of running them on schedule). And we > collide > > > with 1 sec granularity on several occasions. Having a task uuid or > > > associating dag_run to task_instance as suggested by Sergei table will > > help > > > mitigate this issue for us and would make it easy for us to update task > > > results too. We would be happy to start working on this if it makes > > sense. > > > > > > Also we are wondering if there were any work done in community to > support > > > multiple schedulers(or alternates to mysql/Postgres) because 1 > scheduler > > > does not scale for us well and we see slow down of up to couple of > minute > > > sometimes when there are several pending tasks. > > > > > > Thanks > > > > > > > > > > > > On Mon, Nov 21, 2016 at 9:57 AM, Chris Riccomini < > [email protected] > > > <mailto:[email protected]>> > > > wrote: > > > > > > Ensure scheduler can be run continuously without needing restarts > > > > > > +1 > > > > > > On Mon, Nov 21, 2016 at 5:25 AM, David Batista <[email protected] > > <mailto: > > > [email protected]>> wrote: > > > A small request, which might be handy. > > > > > > Having the possibility to select multiple tasks and mark them as > > > Success/Clear/etc. > > > > > > Allow the UI to select individual tasks (i.e., inside the Tree View) > and > > > then have a button to mark them as Success/Clear/etc. > > > > > > On 21 November 2016 at 14:22, Sergei Iakhnin <[email protected]<mailto: > > > [email protected]>> wrote: > > > > > > I've been running Airflow on 1500 cores in the context of scientific > > > workflows for the past year and a half. Features that would be > > > important to > > > me for 2.0: > > > > > > - Add FK to dag_run to the task_instance table on Postgres so that > > > task_instances can be uniquely attributed to dag runs. > > > - Ensure scheduler can be run continuously without needing restarts. > > > Right > > > now it gets into some ill-determined bad state forcing me to restart it > > > every 20 minutes. > > > - Ensure scheduler can handle tens of thousands of active workflows. > > > Right > > > now this results in extremely long scheduling times and inconsistent > > > scheduling even at 2 thousand active workflows. > > > - Add more flexible task scheduling prioritization. The default > > > prioritization is the opposite of the behaviour I want. I would prefer > > > that > > > downstream tasks always have higher priority than upstream tasks to > > > cause > > > entire workflows to tend to complete sooner, rather than scheduling > > > tasks > > > from other workflows. Having a few scheduling prioritization strategies > > > would be beneficial here. > > > - Provide better support for manually-triggered DAGs on the UI i.e. by > > > showing them as queued. > > > - Provide some resource management capabilities via something like > slots > > > that can be defined on workers and occupied by tasks. Using celery's > > > concurrency parameter at the airflow server level is too coarse-grained > > > as > > > it forces all workers to be the same, and does not allow proper > resource > > > management when different workflow tasks have different resource > > > requirements thus hurting utilization (a worker could run 8 parallel > > > tasks > > > with small memory footprint, but only 1 task with large memory > footprint > > > for instance). > > > > > > With best regards, > > > > > > Sergei. > > > > > > > > > On Mon, Nov 21, 2016 at 2:00 PM Ryabchuk, Pavlo < > > > [email protected]<mailto:[email protected]>> > > > wrote: > > > > > > -1. We extremely rely on data profiling, as a pipeline health > > > monitoring > > > tool > > > > > > -----Original Message----- > > > From: Chris Riccomini [mailto:[email protected]] > > > Sent: Saturday, November 19, 2016 1:57 AM > > > To: [email protected]<mailto:dev@airflow. > > > incubator.apache.org> > > > Subject: Re: Airflow 2.0 > > > > > > RIP out the charting application and the data profiler > > > > > > Yes please! +1 > > > > > > On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin < > > > [email protected]<mailto:[email protected]>> wrote: > > > Another point that may be controversial for Airflow 2.0: RIP out the > > > charting application and the data profiler. Even though it's nice to > > > have it there, it's just out of scope and has major security > > > issues/implications. > > > > > > I'm not sure how popular it actually is. We may need to run a survey > > > at some point around this kind of questions. > > > > > > Max > > > > > > On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin < > > > [email protected]<mailto:[email protected]>> wrote: > > > > > > Using FAB's Model, we get pretty much all of that (REST API, > > > auth/perms, > > > CRUD) for free: > > > https://emea01.safelinks.protection.outlook.com/?url= > > > http%3A%2F%2Ffla > > > sk-appbuilder.readthedocs.io<http://sk-appbuilder.readthedocs.io > > > >%2Fen%2Flatest%2F&data=01%7C01% > > > 7C%7C0064f > > > 74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea6 > > > 4919%7C1&sd > > > ata=uIJcFlm02IJ0Yo2cYLxAJZlkbCF2ZMk6dR%2FkhazZwVE%3D&reserved=0 > > > quickhowto.html?highlight=rest#exposed-methods > > > > > > I'm pretty intimate with FAB since I use it (and contributed to it) > > > for Superset/Caravel. > > > > > > All that's needed is to derive FAB's model class instead of > > > SqlAlchemy's model class (which FAB's model wraps and adds > > > functionality to and is 100% compatible AFAICT). > > > > > > Max > > > > > > On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini > > > <[email protected]<mailto:[email protected]>> > > > wrote: > > > > > > It may be doable to run this as a different package > > > `airflow-webserver`, an > > > alternate UI at first, and to eventually rip out the old UI off > > > of > > > the > > > main > > > package. > > > > > > This is the same strategy that I was thinking of for AIRFLOW-85. > > > You > > > can build the new UI in parallel, and then delete the old one > > > later. > > > I really think that a REST interface should be a pre-req to any > > > large/new UI changes, though. Getting unified so that everything > > > is > > > driven through REST will be a big win. > > > > > > On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin > > > <[email protected]<mailto:[email protected]>> wrote: > > > A multi-tenant UI with composable roles on top of granular > > > permissions. > > > > > > Migrating from Flask-Admin to Flask App Builder would be an > > > easy-ish win (since they're both Flask). FAB Provides a good > > > authentication and permission model that ships out-of-the-box > > > with > > > a REST api. Suffice to define FAB models (derivative of > > > SQLAlchemy's model) and you get a set > > > of > > > perms for the model (can_show, can_list, can_add, can_change, > > > can_delete, > > > ...) and a set of CRUD REST endpoints. It would also allow us to > > > rip out the authentication backend code out of Airflow and rely > > > on > > > FAB for that. > > > Also every single view gets permissions auto-created for it, and > > > there > > > are > > > easy way to define row-level type filters based on user > > > permissions. > > > > > > It may be doable to run this as a different package > > > `airflow-webserver`, an > > > alternate UI at first, and to eventually rip out the old UI off > > > of > > > the > > > main > > > package. > > > > > > https://emea01.safelinks.protection.outlook.com/?url= > > > https%3A%2F%2 > > > Fflask-appbuilder.readthedocs.io<http://Fflask-appbuilder. > readthedocs.io > > > >%2Fen%2Flatest%2F&data=01% > > > 7C01%7C% > > > 7C0064f74fd0d940ab732808d4100e9c3f% > > > 7C6d4034cd72254f72b85391feaea64 > > > 919%7C1&sdata=8mUPRcf4%2FQUDSbju%2BjLLImalhZeU7tOA% > > > 2BFpeO%2BjcEs8% > > > 3D&reserved=0 > > > > > > I'd love to carve some time and lead this. > > > > > > On Fri, Nov 18, 2016 at 1:32 PM, Chris Riccomini > > > <[email protected]<mailto:[email protected]> > > > > > > wrote: > > > > > > Full-fledged REST API (that the UI also uses) would be great in > > > 2.0. > > > > > > On Fri, Nov 18, 2016 at 6:26 AM, David Kegley <[email protected]<mailto: > > > [email protected]>> > > > wrote: > > > Hi All, > > > > > > We have been using Airflow heavily for the last couple months > > > and > > > it’s > > > been great so far. Here are a few things we’d like to see > > > prioritized > > > in > > > 2.0. > > > > > > 1) Role based access to DAGs: > > > We would like to see better role based access through the UI. > > > There’s a > > > related ticket out there but it hasn’t seen any action in a few > > > months > > > https://emea01.safelinks.protection.outlook.com/?url= > > > https%3A%2 > > > F%2Fissues.apache.org<http://2Fissues.apache.org>%2Fjira% > > > 2Fbrowse%2FAIRFLOW-85&data=01% > > > 7C01 > > > %7C%7C0064f74fd0d940ab732808d4100e > > > 9c3f%7C6d4034cd72254f72b85391 > > > feaea64919%7C1&sdata=VsgwHZxr0%2FDQN1jeBTJsfyIGu% > > > 2FZkkWhzAvxNvB > > > N531k%3D&reserved=0 > > > > > > We use a templating system to create/deploy DAGs dynamically > > > based on > > > some directory/file structure. This allows analysts to quickly > > > deploy > > > and > > > schedule their ETL code without having to interact with the > > > Airflow installation directly. It would be great if those same > > > analysts could access to their own DAGs in the UI so that they > > > can clear DAG runs, > > > mark > > > success, etc. while keeping them away from our core ETL and > > > other > > > people's/organization's DAGs. Some of this can be accomplished > > > with > > > ‘filter > > > by owner’ but it doesn’t address the use case where a DAG can > > > be > > > maintained > > > by multiple users in the same organization when they have > > > separate > > > Airflow > > > user accounts. > > > > > > 2) An option to turn off backfill: > > > https://emea01.safelinks.protection.outlook.com/?url= > > > https%3A%2 > > > F%2Fissues.apache.org<http://2Fissues.apache.org>%2Fjira% > > > 2Fbrowse%2FAIRFLOW-558&data= > > > 01%7C0 > > > 1%7C%7C0064f74fd0d940ab732808d4100e > > > 9c3f%7C6d4034cd72254f72b8539 > > > 1feaea64919%7C1&sdata=Xkz7dTkFMEa4np19m4ML1VajVqVPNy > > > %2BVSS5Y%2B > > > Sm8Odk%3D&reserved=0 For cases where a DAG does an insert > > > overwrite on a table every day. > > > This might be a realistic option for the current version but I > > > just > > > wanted > > > to call attention to this feature request. > > > > > > Best, > > > David > > > > > > On Nov 17, 2016, at 6:19 PM, Maxime Beauchemin < > > > [email protected]<mailto:[email protected]><mailto: > > > [email protected]>> > > > wrote: > > > > > > *This is a brainstorm email thread about Airflow 2.0!* > > > > > > I wanted to share some ideas around what I would like to do > > > in > > > Airflow > > > 2.0 > > > and would love to hear what others are thinking. I'll compile > > > the > > > ideas > > > that are shared in this thread in a Wiki once the > > > conversation > > > fades. > > > > > > ------------------------------------------- > > > > > > First idea, to get the conversation started: > > > > > > *Breaking down the package* > > > `pip install airflow-common airflow-scheduler > > > airflow-webserver > > > airflow-operators-googlecloud ...` > > > > > > It seems to me like we're getting to a point where having > > > different repositories and different packages would make > > > things > > > much easier in > > > all > > > sorts of ways. For instance the web server is a lot less > > > sensitive > > > than > > > the > > > scheduler, and changes to operators should/could be deployed > > > at > > > will, independently from the main package. People in their > > > environment > > > could > > > upgrade only certain packages when needed. Travis builds > > > would > > > be > > > more > > > targeted, and take less time, ... > > > > > > Also, the whole current "extra_requires" approach to optional > > > dependencies > > > (in setup.py) is kind getting out-of-hand. > > > > > > Of course `pip install airflow` would bring in a collection > > > of > > > sub-packages > > > similar in functionality to what it does now, perhaps without > > > so many operators you probably don't need in your > > > environment. > > > > > > The release process is the main pain-point and the biggest > > > risk > > > for > > > the > > > project, and I feel like this a solid solution to address it. > > > > > > Max > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sergei > > > > > > > > > > > > > > > -- > > > *David Batista* *Data Engineer**, HelloFresh Global* > > > Saarbrücker Str. 37a | 10405 Berlin > > > [email protected]<mailto:[email protected]> <[email protected] > > > <mailto:[email protected]>> > > > > > > -- > > > > > > [image: logo] > > > <http://www.facebook.com/hellofreshde> <http://twitter.com/ > > > HelloFreshde> > > > <http://instagram.com/hellofreshde/> <http://blog.hellofresh.de/> > > > <https://app.adjust.com/ayje08?campaign=Hellofresh& > > > deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F% > > > 2Fwww.hellofresh.com<http://2Fwww.hellofresh.com>%2Fapp% > > > 2F%3Futm_medium%3Demail%26utm_ > > > source%3Demail_signature&fallback=https%3A%2F%2Fwww. > > > hellofresh.com<http://hellofresh.com>%2Fapp%2F% > > 3Futm_medium%3Demail%26utm_ > > > source% > > > 3Demail_signature> > > > > > > *HelloFresh App –Download Now!* > > > <https://app.adjust.com/ayje08?campaign=Hellofresh& > > > deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F% > > > 2Fwww.hellofresh.com<http://2Fwww.hellofresh.com>%2Fapp% > > > 2F%3Futm_medium%3Demail%26utm_ > > > source%3Demail_signature&fallback=https%3A%2F%2Fwww. > > > hellofresh.com<http://hellofresh.com>%2Fapp%2F% > > 3Futm_medium%3Demail%26utm_ > > > source% > > > 3Demail_signature> > > > *We're active in:* > > > US <https://www.hellofresh.com/?utm_medium=email&utm_source= > > > email_signature> > > > | DE > > > <https://www.hellofresh.de/?utm_medium=email&utm_source= > email_signature> > > > | > > > UK > > > <https://www.hellofresh.co.uk/?utm_medium=email&utm_source= > > > email_signature> > > > | NL > > > <https://www.hellofresh.nl/?utm_medium=email&utm_source= > email_signature> > > > | > > > AU > > > <https://www.hellofresh.com.au/?utm_medium=email&utm_ > > > source=email_signature> > > > | BE > > > <https://www.hellofresh.be/?utm_medium=email&utm_source= > email_signature> > > > | > > > AT <https://www.hellofresh.at/?utm_medium=email&utm_source= > > > email_signature> > > > | CH > > > <https://www.hellofresh.ch/?utm_medium=email&utm_source= > email_signature> > > > | > > > CA <https://www.hellofresh.ca/?utm_medium=email&utm_source= > > > email_signature> > > > > > > www.HelloFreshGroup.com<http://www.HelloFreshGroup.com> > > > <http://www.hellofreshgroup.com/?utm_medium=email&utm_ > > > source=email_signature> > > > > > > We are hiring around the world – Click here to join us > > > <https://www.hellofresh.com/jobs/?utm_medium=email&utm_ > > > source=email_signature> > > > > > > -- > > > > > > <https://www.hellofresh.com/jobs/?utm_medium=email&utm_ > > > source=email_signature> > > > HelloFresh AG, Berlin (Sitz der Gesellschaft) | Vorstände: Dominik S. > > > Richter (Vorsitzender), Thomas W. Griesel, Christian Gärtner | > > > Vorsitzender > > > des Aufsichtsrats: Jeffrey Lieberman | Eingetragen beim Amtsgericht > > > Charlottenburg, HRB 171666 B | USt-Id Nr.: DE 302210417 > > > > > > *CONFIDENTIALITY NOTICE:* This message (including any attachments) is > > > confidential and may be privileged. It may be read, copied and used > only > > > by > > > the intended recipient. If you have received it in error please contact > > > the > > > sender (by return e-mail) immediately and delete this message. Any > > > unauthorized use or dissemination of this message in whole or in parts > is > > > strictly prohibited. > > > > > > > > > ________________________________ > > > This e-mail and any attachments may be confidential or legally > > privileged. > > > If you received this message in error or are not the intended > recipient, > > > you should destroy the e-mail message and any attachments or copies, > and > > > you are prohibited from retaining, distributing, disclosing or using > any > > > information contained herein. Please inform us of the erroneous > delivery > > by > > > return e-mail. Thank you for your cooperation. > > > > > > > > >
