Hey Gurer, Thanks for the summary. I have updated the format a little bit and added some items of my own. I left the old style in tact for now, if that is a more convenient format after all.
Bolke > Op 12 dec. 2016, om 17:04 heeft Gurer Kiratli > <[email protected]> het volgende geschreven: > > Hi folks, > > Here is the list > <https://cwiki.apache.org/confluence/display/AIRFLOW/2017+Roadmap+Items> of > possible roadmap items for 2017. I think that clubbing deliverables into > 1.9 or 2.0 is orthogonal to our high level 2017 planning so I went with > this approach. > > Please take a look at the wiki and see if there is something missing or > needs further clarification by the end of the week and I will send out a > survey next week to get a sense of priorities across the community. > > Let me know if you have any questions. > > Cheers, > > Gurer > > On Tue, Dec 6, 2016 at 11:15 PM, Maxime Beauchemin < > [email protected]> wrote: > >> I spoke with Gurer yesterday and he's going to summarize and send a survey. >> It should be out this week. >> >> Max >> >> On Tue, Dec 6, 2016 at 7:24 PM, siddharth anand <[email protected]> wrote: >> >>> Max, >>> Do you have time to summarize this thread? Perhaps, publish it on the >> Wiki! >>> -s >>> >>> On Thu, Dec 1, 2016 at 12:27 PM, Van Klaveren, Brian N. < >>> [email protected]> wrote: >>> >>>> With the announcement of AWS Batch (https://aws.amazon.com/batch/), >> and >>>> my own selfish needs, I think it'd be really great to generally support >>>> Batch systems like AWS Batch, Slurm, and Torque as executors, >> potentially >>>> with an extension of the BashOperator, but I think it might actually be >>>> flexible enough to not need a dedicated BatchOperator. >>>> >>>> Brian >>>> >>>> >>>> On Nov 24, 2016, at 7:40 AM, Maycock, Luke <luke.maycock@affiliate. >>>> oliverwyman.com<mailto:[email protected]>> wrote: >>>> >>>> Add FK to dag_run to the task_instance table on Postgres so that >>>> task_instances can be uniquely attributed to dag runs. >>>> >>>> >>>> + 1 >>>> >>>> >>>> Also, I believe xcoms would need to be addressed in the same way at the >>>> same time - I have added a comment to that affect on >>>> https://issues.apache.org/jira/browse/AIRFLOW-642 >>>> >>>> >>>> I believe this would be implemented for all supported back-ends, not >> just >>>> PostgreSQL. >>>> >>>> >>>> Cheers, >>>> Luke Maycock >>>> OLIVER WYMAN >>>> [email protected]<mailto:luke. >>>> [email protected]><mailto:luke.maycock@ >>>> affiliate.oliverwyman.com> >>>> www.oliverwyman.com<http://www.oliverwyman.com><http:// >>>> www.oliverwyman.com/> >>>> >>>> >>>> >>>> ________________________________ >>>> From: Arunprasad Venkatraman <[email protected]<mailto:[email protected]>> >>>> Sent: 21 November 2016 18:16 >>>> To: [email protected]<mailto:dev@airflow. >>>> incubator.apache.org> >>>> Subject: Re: Airflow 2.0 >>>> >>>> Add FK to dag_run to the task_instance table on Postgres so that >>>> task_instances can be uniquely attributed to dag runs. >>>> Ensure scheduler can be run continuously without needing restarts. >>>> Ensure scheduler can handle tens of thousands of active workflows >>>> >>>> +1 >>>> >>>> We are planning to run around 40,000 tasks a day using airflow and some >>> of >>>> them are critical to give quick feedback to developers. Currently >> having >>>> execution date to uniquely identify tasks does not work for us since we >>>> mainly trigger dags (instead of running them on schedule). And we >> collide >>>> with 1 sec granularity on several occasions. Having a task uuid or >>>> associating dag_run to task_instance as suggested by Sergei table will >>> help >>>> mitigate this issue for us and would make it easy for us to update task >>>> results too. We would be happy to start working on this if it makes >>> sense. >>>> >>>> Also we are wondering if there were any work done in community to >> support >>>> multiple schedulers(or alternates to mysql/Postgres) because 1 >> scheduler >>>> does not scale for us well and we see slow down of up to couple of >> minute >>>> sometimes when there are several pending tasks. >>>> >>>> Thanks >>>> >>>> >>>> >>>> On Mon, Nov 21, 2016 at 9:57 AM, Chris Riccomini < >> [email protected] >>>> <mailto:[email protected]>> >>>> wrote: >>>> >>>> Ensure scheduler can be run continuously without needing restarts >>>> >>>> +1 >>>> >>>> On Mon, Nov 21, 2016 at 5:25 AM, David Batista <[email protected] >>> <mailto: >>>> [email protected]>> wrote: >>>> A small request, which might be handy. >>>> >>>> Having the possibility to select multiple tasks and mark them as >>>> Success/Clear/etc. >>>> >>>> Allow the UI to select individual tasks (i.e., inside the Tree View) >> and >>>> then have a button to mark them as Success/Clear/etc. >>>> >>>> On 21 November 2016 at 14:22, Sergei Iakhnin <[email protected]<mailto: >>>> [email protected]>> wrote: >>>> >>>> I've been running Airflow on 1500 cores in the context of scientific >>>> workflows for the past year and a half. Features that would be >>>> important to >>>> me for 2.0: >>>> >>>> - Add FK to dag_run to the task_instance table on Postgres so that >>>> task_instances can be uniquely attributed to dag runs. >>>> - Ensure scheduler can be run continuously without needing restarts. >>>> Right >>>> now it gets into some ill-determined bad state forcing me to restart it >>>> every 20 minutes. >>>> - Ensure scheduler can handle tens of thousands of active workflows. >>>> Right >>>> now this results in extremely long scheduling times and inconsistent >>>> scheduling even at 2 thousand active workflows. >>>> - Add more flexible task scheduling prioritization. The default >>>> prioritization is the opposite of the behaviour I want. I would prefer >>>> that >>>> downstream tasks always have higher priority than upstream tasks to >>>> cause >>>> entire workflows to tend to complete sooner, rather than scheduling >>>> tasks >>>> from other workflows. Having a few scheduling prioritization strategies >>>> would be beneficial here. >>>> - Provide better support for manually-triggered DAGs on the UI i.e. by >>>> showing them as queued. >>>> - Provide some resource management capabilities via something like >> slots >>>> that can be defined on workers and occupied by tasks. Using celery's >>>> concurrency parameter at the airflow server level is too coarse-grained >>>> as >>>> it forces all workers to be the same, and does not allow proper >> resource >>>> management when different workflow tasks have different resource >>>> requirements thus hurting utilization (a worker could run 8 parallel >>>> tasks >>>> with small memory footprint, but only 1 task with large memory >> footprint >>>> for instance). >>>> >>>> With best regards, >>>> >>>> Sergei. >>>> >>>> >>>> On Mon, Nov 21, 2016 at 2:00 PM Ryabchuk, Pavlo < >>>> [email protected]<mailto:[email protected]>> >>>> wrote: >>>> >>>> -1. We extremely rely on data profiling, as a pipeline health >>>> monitoring >>>> tool >>>> >>>> -----Original Message----- >>>> From: Chris Riccomini [mailto:[email protected]] >>>> Sent: Saturday, November 19, 2016 1:57 AM >>>> To: [email protected]<mailto:dev@airflow. >>>> incubator.apache.org> >>>> Subject: Re: Airflow 2.0 >>>> >>>> RIP out the charting application and the data profiler >>>> >>>> Yes please! +1 >>>> >>>> On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin < >>>> [email protected]<mailto:[email protected]>> wrote: >>>> Another point that may be controversial for Airflow 2.0: RIP out the >>>> charting application and the data profiler. Even though it's nice to >>>> have it there, it's just out of scope and has major security >>>> issues/implications. >>>> >>>> I'm not sure how popular it actually is. We may need to run a survey >>>> at some point around this kind of questions. >>>> >>>> Max >>>> >>>> On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin < >>>> [email protected]<mailto:[email protected]>> wrote: >>>> >>>> Using FAB's Model, we get pretty much all of that (REST API, >>>> auth/perms, >>>> CRUD) for free: >>>> https://emea01.safelinks.protection.outlook.com/?url= >>>> http%3A%2F%2Ffla >>>> sk-appbuilder.readthedocs.io<http://sk-appbuilder.readthedocs.io >>>>> %2Fen%2Flatest%2F&data=01%7C01% >>>> 7C%7C0064f >>>> 74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea6 >>>> 4919%7C1&sd >>>> ata=uIJcFlm02IJ0Yo2cYLxAJZlkbCF2ZMk6dR%2FkhazZwVE%3D&reserved=0 >>>> quickhowto.html?highlight=rest#exposed-methods >>>> >>>> I'm pretty intimate with FAB since I use it (and contributed to it) >>>> for Superset/Caravel. >>>> >>>> All that's needed is to derive FAB's model class instead of >>>> SqlAlchemy's model class (which FAB's model wraps and adds >>>> functionality to and is 100% compatible AFAICT). >>>> >>>> Max >>>> >>>> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini >>>> <[email protected]<mailto:[email protected]>> >>>> wrote: >>>> >>>> It may be doable to run this as a different package >>>> `airflow-webserver`, an >>>> alternate UI at first, and to eventually rip out the old UI off >>>> of >>>> the >>>> main >>>> package. >>>> >>>> This is the same strategy that I was thinking of for AIRFLOW-85. >>>> You >>>> can build the new UI in parallel, and then delete the old one >>>> later. >>>> I really think that a REST interface should be a pre-req to any >>>> large/new UI changes, though. Getting unified so that everything >>>> is >>>> driven through REST will be a big win. >>>> >>>> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin >>>> <[email protected]<mailto:[email protected]>> wrote: >>>> A multi-tenant UI with composable roles on top of granular >>>> permissions. >>>> >>>> Migrating from Flask-Admin to Flask App Builder would be an >>>> easy-ish win (since they're both Flask). FAB Provides a good >>>> authentication and permission model that ships out-of-the-box >>>> with >>>> a REST api. Suffice to define FAB models (derivative of >>>> SQLAlchemy's model) and you get a set >>>> of >>>> perms for the model (can_show, can_list, can_add, can_change, >>>> can_delete, >>>> ...) and a set of CRUD REST endpoints. It would also allow us to >>>> rip out the authentication backend code out of Airflow and rely >>>> on >>>> FAB for that. >>>> Also every single view gets permissions auto-created for it, and >>>> there >>>> are >>>> easy way to define row-level type filters based on user >>>> permissions. >>>> >>>> It may be doable to run this as a different package >>>> `airflow-webserver`, an >>>> alternate UI at first, and to eventually rip out the old UI off >>>> of >>>> the >>>> main >>>> package. >>>> >>>> https://emea01.safelinks.protection.outlook.com/?url= >>>> https%3A%2F%2 >>>> Fflask-appbuilder.readthedocs.io<http://Fflask-appbuilder. >> readthedocs.io >>>>> %2Fen%2Flatest%2F&data=01% >>>> 7C01%7C% >>>> 7C0064f74fd0d940ab732808d4100e9c3f% >>>> 7C6d4034cd72254f72b85391feaea64 >>>> 919%7C1&sdata=8mUPRcf4%2FQUDSbju%2BjLLImalhZeU7tOA% >>>> 2BFpeO%2BjcEs8% >>>> 3D&reserved=0 >>>> >>>> I'd love to carve some time and lead this. >>>> >>>> On Fri, Nov 18, 2016 at 1:32 PM, Chris Riccomini >>>> <[email protected]<mailto:[email protected]> >>>> >>>> wrote: >>>> >>>> Full-fledged REST API (that the UI also uses) would be great in >>>> 2.0. >>>> >>>> On Fri, Nov 18, 2016 at 6:26 AM, David Kegley <[email protected]<mailto: >>>> [email protected]>> >>>> wrote: >>>> Hi All, >>>> >>>> We have been using Airflow heavily for the last couple months >>>> and >>>> it’s >>>> been great so far. Here are a few things we’d like to see >>>> prioritized >>>> in >>>> 2.0. >>>> >>>> 1) Role based access to DAGs: >>>> We would like to see better role based access through the UI. >>>> There’s a >>>> related ticket out there but it hasn’t seen any action in a few >>>> months >>>> https://emea01.safelinks.protection.outlook.com/?url= >>>> https%3A%2 >>>> F%2Fissues.apache.org<http://2Fissues.apache.org>%2Fjira% >>>> 2Fbrowse%2FAIRFLOW-85&data=01% >>>> 7C01 >>>> %7C%7C0064f74fd0d940ab732808d4100e >>>> 9c3f%7C6d4034cd72254f72b85391 >>>> feaea64919%7C1&sdata=VsgwHZxr0%2FDQN1jeBTJsfyIGu% >>>> 2FZkkWhzAvxNvB >>>> N531k%3D&reserved=0 >>>> >>>> We use a templating system to create/deploy DAGs dynamically >>>> based on >>>> some directory/file structure. This allows analysts to quickly >>>> deploy >>>> and >>>> schedule their ETL code without having to interact with the >>>> Airflow installation directly. It would be great if those same >>>> analysts could access to their own DAGs in the UI so that they >>>> can clear DAG runs, >>>> mark >>>> success, etc. while keeping them away from our core ETL and >>>> other >>>> people's/organization's DAGs. Some of this can be accomplished >>>> with >>>> ‘filter >>>> by owner’ but it doesn’t address the use case where a DAG can >>>> be >>>> maintained >>>> by multiple users in the same organization when they have >>>> separate >>>> Airflow >>>> user accounts. >>>> >>>> 2) An option to turn off backfill: >>>> https://emea01.safelinks.protection.outlook.com/?url= >>>> https%3A%2 >>>> F%2Fissues.apache.org<http://2Fissues.apache.org>%2Fjira% >>>> 2Fbrowse%2FAIRFLOW-558&data= >>>> 01%7C0 >>>> 1%7C%7C0064f74fd0d940ab732808d4100e >>>> 9c3f%7C6d4034cd72254f72b8539 >>>> 1feaea64919%7C1&sdata=Xkz7dTkFMEa4np19m4ML1VajVqVPNy >>>> %2BVSS5Y%2B >>>> Sm8Odk%3D&reserved=0 For cases where a DAG does an insert >>>> overwrite on a table every day. >>>> This might be a realistic option for the current version but I >>>> just >>>> wanted >>>> to call attention to this feature request. >>>> >>>> Best, >>>> David >>>> >>>> On Nov 17, 2016, at 6:19 PM, Maxime Beauchemin < >>>> [email protected]<mailto:[email protected]><mailto: >>>> [email protected]>> >>>> wrote: >>>> >>>> *This is a brainstorm email thread about Airflow 2.0!* >>>> >>>> I wanted to share some ideas around what I would like to do >>>> in >>>> Airflow >>>> 2.0 >>>> and would love to hear what others are thinking. I'll compile >>>> the >>>> ideas >>>> that are shared in this thread in a Wiki once the >>>> conversation >>>> fades. >>>> >>>> ------------------------------------------- >>>> >>>> First idea, to get the conversation started: >>>> >>>> *Breaking down the package* >>>> `pip install airflow-common airflow-scheduler >>>> airflow-webserver >>>> airflow-operators-googlecloud ...` >>>> >>>> It seems to me like we're getting to a point where having >>>> different repositories and different packages would make >>>> things >>>> much easier in >>>> all >>>> sorts of ways. For instance the web server is a lot less >>>> sensitive >>>> than >>>> the >>>> scheduler, and changes to operators should/could be deployed >>>> at >>>> will, independently from the main package. People in their >>>> environment >>>> could >>>> upgrade only certain packages when needed. Travis builds >>>> would >>>> be >>>> more >>>> targeted, and take less time, ... >>>> >>>> Also, the whole current "extra_requires" approach to optional >>>> dependencies >>>> (in setup.py) is kind getting out-of-hand. >>>> >>>> Of course `pip install airflow` would bring in a collection >>>> of >>>> sub-packages >>>> similar in functionality to what it does now, perhaps without >>>> so many operators you probably don't need in your >>>> environment. >>>> >>>> The release process is the main pain-point and the biggest >>>> risk >>>> for >>>> the >>>> project, and I feel like this a solid solution to address it. >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Sergei >>>> >>>> >>>> >>>> >>>> -- >>>> *David Batista* *Data Engineer**, HelloFresh Global* >>>> Saarbrücker Str. 37a | 10405 Berlin >>>> [email protected]<mailto:[email protected]> <[email protected] >>>> <mailto:[email protected]>> >>>> >>>> -- >>>> >>>> [image: logo] >>>> <http://www.facebook.com/hellofreshde> <http://twitter.com/ >>>> HelloFreshde> >>>> <http://instagram.com/hellofreshde/> <http://blog.hellofresh.de/> >>>> <https://app.adjust.com/ayje08?campaign=Hellofresh& >>>> deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F% >>>> 2Fwww.hellofresh.com<http://2Fwww.hellofresh.com>%2Fapp% >>>> 2F%3Futm_medium%3Demail%26utm_ >>>> source%3Demail_signature&fallback=https%3A%2F%2Fwww. >>>> hellofresh.com<http://hellofresh.com>%2Fapp%2F% >>> 3Futm_medium%3Demail%26utm_ >>>> source% >>>> 3Demail_signature> >>>> >>>> *HelloFresh App –Download Now!* >>>> <https://app.adjust.com/ayje08?campaign=Hellofresh& >>>> deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F% >>>> 2Fwww.hellofresh.com<http://2Fwww.hellofresh.com>%2Fapp% >>>> 2F%3Futm_medium%3Demail%26utm_ >>>> source%3Demail_signature&fallback=https%3A%2F%2Fwww. >>>> hellofresh.com<http://hellofresh.com>%2Fapp%2F% >>> 3Futm_medium%3Demail%26utm_ >>>> source% >>>> 3Demail_signature> >>>> *We're active in:* >>>> US <https://www.hellofresh.com/?utm_medium=email&utm_source= >>>> email_signature> >>>> | DE >>>> <https://www.hellofresh.de/?utm_medium=email&utm_source= >> email_signature> >>>> | >>>> UK >>>> <https://www.hellofresh.co.uk/?utm_medium=email&utm_source= >>>> email_signature> >>>> | NL >>>> <https://www.hellofresh.nl/?utm_medium=email&utm_source= >> email_signature> >>>> | >>>> AU >>>> <https://www.hellofresh.com.au/?utm_medium=email&utm_ >>>> source=email_signature> >>>> | BE >>>> <https://www.hellofresh.be/?utm_medium=email&utm_source= >> email_signature> >>>> | >>>> AT <https://www.hellofresh.at/?utm_medium=email&utm_source= >>>> email_signature> >>>> | CH >>>> <https://www.hellofresh.ch/?utm_medium=email&utm_source= >> email_signature> >>>> | >>>> CA <https://www.hellofresh.ca/?utm_medium=email&utm_source= >>>> email_signature> >>>> >>>> www.HelloFreshGroup.com<http://www.HelloFreshGroup.com> >>>> <http://www.hellofreshgroup.com/?utm_medium=email&utm_ >>>> source=email_signature> >>>> >>>> We are hiring around the world – Click here to join us >>>> <https://www.hellofresh.com/jobs/?utm_medium=email&utm_ >>>> source=email_signature> >>>> >>>> -- >>>> >>>> <https://www.hellofresh.com/jobs/?utm_medium=email&utm_ >>>> source=email_signature> >>>> HelloFresh AG, Berlin (Sitz der Gesellschaft) | Vorstände: Dominik S. >>>> Richter (Vorsitzender), Thomas W. Griesel, Christian Gärtner | >>>> Vorsitzender >>>> des Aufsichtsrats: Jeffrey Lieberman | Eingetragen beim Amtsgericht >>>> Charlottenburg, HRB 171666 B | USt-Id Nr.: DE 302210417 >>>> >>>> *CONFIDENTIALITY NOTICE:* This message (including any attachments) is >>>> confidential and may be privileged. It may be read, copied and used >> only >>>> by >>>> the intended recipient. If you have received it in error please contact >>>> the >>>> sender (by return e-mail) immediately and delete this message. Any >>>> unauthorized use or dissemination of this message in whole or in parts >> is >>>> strictly prohibited. >>>> >>>> >>>> ________________________________ >>>> This e-mail and any attachments may be confidential or legally >>> privileged. >>>> If you received this message in error or are not the intended >> recipient, >>>> you should destroy the e-mail message and any attachments or copies, >> and >>>> you are prohibited from retaining, distributing, disclosing or using >> any >>>> information contained herein. Please inform us of the erroneous >> delivery >>> by >>>> return e-mail. Thank you for your cooperation. >>>> >>>> >>> >>
