Hey Gurer,

Thanks for the summary. I have updated the format a little bit and added some 
items of my own. I left the old style in tact for now, if that is a more 
convenient format after all.

Bolke

> Op 12 dec. 2016, om 17:04 heeft Gurer Kiratli 
> <[email protected]> het volgende geschreven:
> 
> Hi folks,
> 
> Here is the list
> <https://cwiki.apache.org/confluence/display/AIRFLOW/2017+Roadmap+Items> of
> possible roadmap items for 2017. I think that clubbing deliverables into
> 1.9 or 2.0 is orthogonal to our high level 2017 planning so I went with
> this approach.
> 
> Please take a look at the wiki and see if there is something missing or
> needs further clarification by the end of the week and I will send out a
> survey next week to get a sense of priorities across the community.
> 
> Let me know if you have any questions.
> 
> Cheers,
> 
> Gurer
> 
> On Tue, Dec 6, 2016 at 11:15 PM, Maxime Beauchemin <
> [email protected]> wrote:
> 
>> I spoke with Gurer yesterday and he's going to summarize and send a survey.
>> It should be out this week.
>> 
>> Max
>> 
>> On Tue, Dec 6, 2016 at 7:24 PM, siddharth anand <[email protected]> wrote:
>> 
>>> Max,
>>> Do you have time to summarize this thread? Perhaps, publish it on the
>> Wiki!
>>> -s
>>> 
>>> On Thu, Dec 1, 2016 at 12:27 PM, Van Klaveren, Brian N. <
>>> [email protected]> wrote:
>>> 
>>>> With the announcement of AWS Batch (https://aws.amazon.com/batch/),
>> and
>>>> my own selfish needs, I think it'd be really great to generally support
>>>> Batch systems like AWS Batch, Slurm, and Torque as executors,
>> potentially
>>>> with an extension of the BashOperator, but I think it might actually be
>>>> flexible enough to not need a dedicated BatchOperator.
>>>> 
>>>> Brian
>>>> 
>>>> 
>>>> On Nov 24, 2016, at 7:40 AM, Maycock, Luke <luke.maycock@affiliate.
>>>> oliverwyman.com<mailto:[email protected]>> wrote:
>>>> 
>>>> Add FK to dag_run to the task_instance table on Postgres so that
>>>> task_instances can be uniquely attributed to dag runs.
>>>> 
>>>> 
>>>> + 1
>>>> 
>>>> 
>>>> Also, I believe xcoms would need to be addressed in the same way at the
>>>> same time - I have added a comment to that affect on
>>>> https://issues.apache.org/jira/browse/AIRFLOW-642
>>>> 
>>>> 
>>>> I believe this would be implemented for all supported back-ends, not
>> just
>>>> PostgreSQL.
>>>> 
>>>> 
>>>> Cheers,
>>>> Luke Maycock
>>>> OLIVER WYMAN
>>>> [email protected]<mailto:luke.
>>>> [email protected]><mailto:luke.maycock@
>>>> affiliate.oliverwyman.com>
>>>> www.oliverwyman.com<http://www.oliverwyman.com><http://
>>>> www.oliverwyman.com/>
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> From: Arunprasad Venkatraman <[email protected]<mailto:[email protected]>>
>>>> Sent: 21 November 2016 18:16
>>>> To: [email protected]<mailto:dev@airflow.
>>>> incubator.apache.org>
>>>> Subject: Re: Airflow 2.0
>>>> 
>>>> Add FK to dag_run to the task_instance table on Postgres so that
>>>> task_instances can be uniquely attributed to dag runs.
>>>> Ensure scheduler can be run continuously without needing restarts.
>>>> Ensure scheduler can handle tens of thousands of active workflows
>>>> 
>>>> +1
>>>> 
>>>> We are planning to run around 40,000 tasks a day using airflow and some
>>> of
>>>> them are critical to give quick feedback to developers. Currently
>> having
>>>> execution date to uniquely identify tasks does not work for us since we
>>>> mainly trigger dags (instead of running them on schedule). And we
>> collide
>>>> with 1 sec granularity on several occasions.  Having a task uuid or
>>>> associating dag_run to task_instance as suggested by Sergei table will
>>> help
>>>> mitigate this issue for us and would make it easy for us to update task
>>>> results too. We would be happy to start working on this if it makes
>>> sense.
>>>> 
>>>> Also we are wondering if there were any work done in community to
>> support
>>>> multiple schedulers(or alternates to mysql/Postgres) because 1
>> scheduler
>>>> does not scale for us well and we see slow down of up to couple of
>> minute
>>>> sometimes when there are several pending tasks.
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> 
>>>> On Mon, Nov 21, 2016 at 9:57 AM, Chris Riccomini <
>> [email protected]
>>>> <mailto:[email protected]>>
>>>> wrote:
>>>> 
>>>> Ensure scheduler can be run continuously without needing restarts
>>>> 
>>>> +1
>>>> 
>>>> On Mon, Nov 21, 2016 at 5:25 AM, David Batista <[email protected]
>>> <mailto:
>>>> [email protected]>> wrote:
>>>> A small request, which might be handy.
>>>> 
>>>> Having the possibility to select multiple tasks and mark them as
>>>> Success/Clear/etc.
>>>> 
>>>> Allow the UI to select individual tasks (i.e., inside the Tree View)
>> and
>>>> then have a button to mark them as Success/Clear/etc.
>>>> 
>>>> On 21 November 2016 at 14:22, Sergei Iakhnin <[email protected]<mailto:
>>>> [email protected]>> wrote:
>>>> 
>>>> I've been running Airflow on 1500 cores in the context of scientific
>>>> workflows for the past year and a half. Features that would be
>>>> important to
>>>> me for 2.0:
>>>> 
>>>> - Add FK to dag_run to the task_instance table on Postgres so that
>>>> task_instances can be uniquely attributed to dag runs.
>>>> - Ensure scheduler can be run continuously without needing restarts.
>>>> Right
>>>> now it gets into some ill-determined bad state forcing me to restart it
>>>> every 20 minutes.
>>>> - Ensure scheduler can handle tens of thousands of active workflows.
>>>> Right
>>>> now this results in extremely long scheduling times and inconsistent
>>>> scheduling even at 2 thousand active workflows.
>>>> - Add more flexible task scheduling prioritization. The default
>>>> prioritization is the opposite of the behaviour I want. I would prefer
>>>> that
>>>> downstream tasks always have higher priority than upstream tasks to
>>>> cause
>>>> entire workflows to tend to complete sooner, rather than scheduling
>>>> tasks
>>>> from other workflows. Having a few scheduling prioritization strategies
>>>> would be beneficial here.
>>>> - Provide better support for manually-triggered DAGs on the UI i.e. by
>>>> showing them as queued.
>>>> - Provide some resource management capabilities via something like
>> slots
>>>> that can be defined on workers and occupied by tasks. Using celery's
>>>> concurrency parameter at the airflow server level is too coarse-grained
>>>> as
>>>> it forces all workers to be the same, and does not allow proper
>> resource
>>>> management when different workflow tasks have different resource
>>>> requirements thus hurting utilization (a worker could run 8 parallel
>>>> tasks
>>>> with small memory footprint, but only 1 task with large memory
>> footprint
>>>> for instance).
>>>> 
>>>> With best regards,
>>>> 
>>>> Sergei.
>>>> 
>>>> 
>>>> On Mon, Nov 21, 2016 at 2:00 PM Ryabchuk, Pavlo <
>>>> [email protected]<mailto:[email protected]>>
>>>> wrote:
>>>> 
>>>> -1. We extremely rely on data profiling, as a pipeline health
>>>> monitoring
>>>> tool
>>>> 
>>>> -----Original Message-----
>>>> From: Chris Riccomini [mailto:[email protected]]
>>>> Sent: Saturday, November 19, 2016 1:57 AM
>>>> To: [email protected]<mailto:dev@airflow.
>>>> incubator.apache.org>
>>>> Subject: Re: Airflow 2.0
>>>> 
>>>> RIP out the charting application and the data profiler
>>>> 
>>>> Yes please! +1
>>>> 
>>>> On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin <
>>>> [email protected]<mailto:[email protected]>> wrote:
>>>> Another point that may be controversial for Airflow 2.0: RIP out the
>>>> charting application and the data profiler. Even though it's nice to
>>>> have it there, it's just out of scope and has major security
>>>> issues/implications.
>>>> 
>>>> I'm not sure how popular it actually is. We may need to run a survey
>>>> at some point around this kind of questions.
>>>> 
>>>> Max
>>>> 
>>>> On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin <
>>>> [email protected]<mailto:[email protected]>> wrote:
>>>> 
>>>> Using FAB's Model, we get pretty much all of that (REST API,
>>>> auth/perms,
>>>> CRUD) for free:
>>>> https://emea01.safelinks.protection.outlook.com/?url=
>>>> http%3A%2F%2Ffla
>>>> sk-appbuilder.readthedocs.io<http://sk-appbuilder.readthedocs.io
>>>>> %2Fen%2Flatest%2F&data=01%7C01%
>>>> 7C%7C0064f
>>>> 74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea6
>>>> 4919%7C1&sd
>>>> ata=uIJcFlm02IJ0Yo2cYLxAJZlkbCF2ZMk6dR%2FkhazZwVE%3D&reserved=0
>>>> quickhowto.html?highlight=rest#exposed-methods
>>>> 
>>>> I'm pretty intimate with FAB since I use it (and contributed to it)
>>>> for Superset/Caravel.
>>>> 
>>>> All that's needed is to derive FAB's model class instead of
>>>> SqlAlchemy's model class (which FAB's model wraps and adds
>>>> functionality to and is 100% compatible AFAICT).
>>>> 
>>>> Max
>>>> 
>>>> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini
>>>> <[email protected]<mailto:[email protected]>>
>>>> wrote:
>>>> 
>>>> It may be doable to run this as a different package
>>>> `airflow-webserver`, an
>>>> alternate UI at first, and to eventually rip out the old UI off
>>>> of
>>>> the
>>>> main
>>>> package.
>>>> 
>>>> This is the same strategy that I was thinking of for AIRFLOW-85.
>>>> You
>>>> can build the new UI in parallel, and then delete the old one
>>>> later.
>>>> I really think that a REST interface should be a pre-req to any
>>>> large/new UI changes, though. Getting unified so that everything
>>>> is
>>>> driven through REST will be a big win.
>>>> 
>>>> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin
>>>> <[email protected]<mailto:[email protected]>> wrote:
>>>> A multi-tenant UI with composable roles on top of granular
>>>> permissions.
>>>> 
>>>> Migrating from Flask-Admin to Flask App Builder would be an
>>>> easy-ish win (since they're both Flask). FAB Provides a good
>>>> authentication and permission model that ships out-of-the-box
>>>> with
>>>> a REST api. Suffice to define FAB models (derivative of
>>>> SQLAlchemy's model) and you get a set
>>>> of
>>>> perms for the model (can_show, can_list, can_add, can_change,
>>>> can_delete,
>>>> ...) and a set of CRUD REST endpoints. It would also allow us to
>>>> rip out the authentication backend code out of Airflow and rely
>>>> on
>>>> FAB for that.
>>>> Also every single view gets permissions auto-created for it, and
>>>> there
>>>> are
>>>> easy way to define row-level type filters based on user
>>>> permissions.
>>>> 
>>>> It may be doable to run this as a different package
>>>> `airflow-webserver`, an
>>>> alternate UI at first, and to eventually rip out the old UI off
>>>> of
>>>> the
>>>> main
>>>> package.
>>>> 
>>>> https://emea01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2
>>>> Fflask-appbuilder.readthedocs.io<http://Fflask-appbuilder.
>> readthedocs.io
>>>>> %2Fen%2Flatest%2F&data=01%
>>>> 7C01%7C%
>>>> 7C0064f74fd0d940ab732808d4100e9c3f%
>>>> 7C6d4034cd72254f72b85391feaea64
>>>> 919%7C1&sdata=8mUPRcf4%2FQUDSbju%2BjLLImalhZeU7tOA%
>>>> 2BFpeO%2BjcEs8%
>>>> 3D&reserved=0
>>>> 
>>>> I'd love to carve some time and lead this.
>>>> 
>>>> On Fri, Nov 18, 2016 at 1:32 PM, Chris Riccomini
>>>> <[email protected]<mailto:[email protected]>
>>>> 
>>>> wrote:
>>>> 
>>>> Full-fledged REST API (that the UI also uses) would be great in
>>>> 2.0.
>>>> 
>>>> On Fri, Nov 18, 2016 at 6:26 AM, David Kegley <[email protected]<mailto:
>>>> [email protected]>>
>>>> wrote:
>>>> Hi All,
>>>> 
>>>> We have been using Airflow heavily for the last couple months
>>>> and
>>>> it’s
>>>> been great so far. Here are a few things we’d like to see
>>>> prioritized
>>>> in
>>>> 2.0.
>>>> 
>>>> 1) Role based access to DAGs:
>>>> We would like to see better role based access through the UI.
>>>> There’s a
>>>> related ticket out there but it hasn’t seen any action in a few
>>>> months
>>>> https://emea01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2
>>>> F%2Fissues.apache.org<http://2Fissues.apache.org>%2Fjira%
>>>> 2Fbrowse%2FAIRFLOW-85&data=01%
>>>> 7C01
>>>> %7C%7C0064f74fd0d940ab732808d4100e
>>>> 9c3f%7C6d4034cd72254f72b85391
>>>> feaea64919%7C1&sdata=VsgwHZxr0%2FDQN1jeBTJsfyIGu%
>>>> 2FZkkWhzAvxNvB
>>>> N531k%3D&reserved=0
>>>> 
>>>> We use a templating system to create/deploy DAGs dynamically
>>>> based on
>>>> some directory/file structure. This allows analysts to quickly
>>>> deploy
>>>> and
>>>> schedule their ETL code without having to interact with the
>>>> Airflow installation directly. It would be great if those same
>>>> analysts could access to their own DAGs in the UI so that they
>>>> can clear DAG runs,
>>>> mark
>>>> success, etc. while keeping them away from our core ETL and
>>>> other
>>>> people's/organization's DAGs. Some of this can be accomplished
>>>> with
>>>> ‘filter
>>>> by owner’ but it doesn’t address the use case where a DAG can
>>>> be
>>>> maintained
>>>> by multiple users in the same organization when they have
>>>> separate
>>>> Airflow
>>>> user accounts.
>>>> 
>>>> 2) An option to turn off backfill:
>>>> https://emea01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2
>>>> F%2Fissues.apache.org<http://2Fissues.apache.org>%2Fjira%
>>>> 2Fbrowse%2FAIRFLOW-558&data=
>>>> 01%7C0
>>>> 1%7C%7C0064f74fd0d940ab732808d4100e
>>>> 9c3f%7C6d4034cd72254f72b8539
>>>> 1feaea64919%7C1&sdata=Xkz7dTkFMEa4np19m4ML1VajVqVPNy
>>>> %2BVSS5Y%2B
>>>> Sm8Odk%3D&reserved=0 For cases where a DAG does an insert
>>>> overwrite on a table every day.
>>>> This might be a realistic option for the current version but I
>>>> just
>>>> wanted
>>>> to call attention to this feature request.
>>>> 
>>>> Best,
>>>> David
>>>> 
>>>> On Nov 17, 2016, at 6:19 PM, Maxime Beauchemin <
>>>> [email protected]<mailto:[email protected]><mailto:
>>>> [email protected]>>
>>>> wrote:
>>>> 
>>>> *This is a brainstorm email thread about Airflow 2.0!*
>>>> 
>>>> I wanted to share some ideas around what I would like to do
>>>> in
>>>> Airflow
>>>> 2.0
>>>> and would love to hear what others are thinking. I'll compile
>>>> the
>>>> ideas
>>>> that are shared in this thread in a Wiki once the
>>>> conversation
>>>> fades.
>>>> 
>>>> -------------------------------------------
>>>> 
>>>> First idea, to get the conversation started:
>>>> 
>>>> *Breaking down the package*
>>>> `pip install airflow-common airflow-scheduler
>>>> airflow-webserver
>>>> airflow-operators-googlecloud ...`
>>>> 
>>>> It seems to me like we're getting to a point where having
>>>> different repositories and different packages would make
>>>> things
>>>> much easier in
>>>> all
>>>> sorts of ways. For instance the web server is a lot less
>>>> sensitive
>>>> than
>>>> the
>>>> scheduler, and changes to operators should/could be deployed
>>>> at
>>>> will, independently from the main package. People in their
>>>> environment
>>>> could
>>>> upgrade only certain packages when needed. Travis builds
>>>> would
>>>> be
>>>> more
>>>> targeted, and take less time, ...
>>>> 
>>>> Also, the whole current "extra_requires" approach to optional
>>>> dependencies
>>>> (in setup.py) is kind getting out-of-hand.
>>>> 
>>>> Of course `pip install airflow` would bring in a collection
>>>> of
>>>> sub-packages
>>>> similar in functionality to what it does now, perhaps without
>>>> so many operators you probably don't need in your
>>>> environment.
>>>> 
>>>> The release process is the main pain-point and the biggest
>>>> risk
>>>> for
>>>> the
>>>> project, and I feel like this a solid solution to address it.
>>>> 
>>>> Max
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Sergei
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> *David Batista* *Data Engineer**, HelloFresh Global*
>>>> Saarbrücker Str. 37a | 10405 Berlin
>>>> [email protected]<mailto:[email protected]> <[email protected]
>>>> <mailto:[email protected]>>
>>>> 
>>>> --
>>>> 
>>>> [image: logo]
>>>> <http://www.facebook.com/hellofreshde>   <http://twitter.com/
>>>> HelloFreshde>
>>>>  <http://instagram.com/hellofreshde/>   <http://blog.hellofresh.de/>
>>>> <https://app.adjust.com/ayje08?campaign=Hellofresh&;
>>>> deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F%
>>>> 2Fwww.hellofresh.com<http://2Fwww.hellofresh.com>%2Fapp%
>>>> 2F%3Futm_medium%3Demail%26utm_
>>>> source%3Demail_signature&fallback=https%3A%2F%2Fwww.
>>>> hellofresh.com<http://hellofresh.com>%2Fapp%2F%
>>> 3Futm_medium%3Demail%26utm_
>>>> source%
>>>> 3Demail_signature>
>>>> 
>>>> *HelloFresh App –Download Now!*
>>>> <https://app.adjust.com/ayje08?campaign=Hellofresh&;
>>>> deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F%
>>>> 2Fwww.hellofresh.com<http://2Fwww.hellofresh.com>%2Fapp%
>>>> 2F%3Futm_medium%3Demail%26utm_
>>>> source%3Demail_signature&fallback=https%3A%2F%2Fwww.
>>>> hellofresh.com<http://hellofresh.com>%2Fapp%2F%
>>> 3Futm_medium%3Demail%26utm_
>>>> source%
>>>> 3Demail_signature>
>>>> *We're active in:*
>>>> US <https://www.hellofresh.com/?utm_medium=email&utm_source=
>>>> email_signature>
>>>> |  DE
>>>> <https://www.hellofresh.de/?utm_medium=email&utm_source=
>> email_signature>
>>>> |
>>>> UK
>>>> <https://www.hellofresh.co.uk/?utm_medium=email&utm_source=
>>>> email_signature>
>>>> |  NL
>>>> <https://www.hellofresh.nl/?utm_medium=email&utm_source=
>> email_signature>
>>>> |
>>>> AU
>>>> <https://www.hellofresh.com.au/?utm_medium=email&utm_
>>>> source=email_signature>
>>>> |  BE
>>>> <https://www.hellofresh.be/?utm_medium=email&utm_source=
>> email_signature>
>>>> |
>>>> AT <https://www.hellofresh.at/?utm_medium=email&utm_source=
>>>> email_signature>
>>>> |  CH
>>>> <https://www.hellofresh.ch/?utm_medium=email&utm_source=
>> email_signature>
>>>> |
>>>> CA <https://www.hellofresh.ca/?utm_medium=email&utm_source=
>>>> email_signature>
>>>> 
>>>> www.HelloFreshGroup.com<http://www.HelloFreshGroup.com>
>>>> <http://www.hellofreshgroup.com/?utm_medium=email&utm_
>>>> source=email_signature>
>>>> 
>>>> We are hiring around the world – Click here to join us
>>>> <https://www.hellofresh.com/jobs/?utm_medium=email&utm_
>>>> source=email_signature>
>>>> 
>>>> --
>>>> 
>>>> <https://www.hellofresh.com/jobs/?utm_medium=email&utm_
>>>> source=email_signature>
>>>> HelloFresh AG, Berlin (Sitz der Gesellschaft) | Vorstände: Dominik S.
>>>> Richter (Vorsitzender), Thomas W. Griesel, Christian Gärtner |
>>>> Vorsitzender
>>>> des Aufsichtsrats: Jeffrey Lieberman | Eingetragen beim Amtsgericht
>>>> Charlottenburg, HRB 171666 B | USt-Id Nr.: DE 302210417
>>>> 
>>>> *CONFIDENTIALITY NOTICE:* This message (including any attachments) is
>>>> confidential and may be privileged. It may be read, copied and used
>> only
>>>> by
>>>> the intended recipient. If you have received it in error please contact
>>>> the
>>>> sender (by return e-mail) immediately and delete this message. Any
>>>> unauthorized use or dissemination of this message in whole or in parts
>> is
>>>> strictly prohibited.
>>>> 
>>>> 
>>>> ________________________________
>>>> This e-mail and any attachments may be confidential or legally
>>> privileged.
>>>> If you received this message in error or are not the intended
>> recipient,
>>>> you should destroy the e-mail message and any attachments or copies,
>> and
>>>> you are prohibited from retaining, distributing, disclosing or using
>> any
>>>> information contained herein. Please inform us of the erroneous
>> delivery
>>> by
>>>> return e-mail. Thank you for your cooperation.
>>>> 
>>>> 
>>> 
>> 

Reply via email to