-1. We extremely rely on data profiling, as a pipeline health monitoring tool
-----Original Message----- From: Chris Riccomini [mailto:[email protected]] Sent: Saturday, November 19, 2016 1:57 AM To: [email protected] Subject: Re: Airflow 2.0 > RIP out the charting application and the data profiler Yes please! +1 On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin <[email protected]> wrote: > Another point that may be controversial for Airflow 2.0: RIP out the > charting application and the data profiler. Even though it's nice to > have it there, it's just out of scope and has major security > issues/implications. > > I'm not sure how popular it actually is. We may need to run a survey > at some point around this kind of questions. > > Max > > On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin < > [email protected]> wrote: > >> Using FAB's Model, we get pretty much all of that (REST API, >> auth/perms, >> CRUD) for free: >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffla >> sk-appbuilder.readthedocs.io%2Fen%2Flatest%2F&data=01%7C01%7C%7C0064f >> 74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea64919%7C1&sd >> ata=uIJcFlm02IJ0Yo2cYLxAJZlkbCF2ZMk6dR%2FkhazZwVE%3D&reserved=0 >> quickhowto.html?highlight=rest#exposed-methods >> >> I'm pretty intimate with FAB since I use it (and contributed to it) >> for Superset/Caravel. >> >> All that's needed is to derive FAB's model class instead of >> SqlAlchemy's model class (which FAB's model wraps and adds >> functionality to and is 100% compatible AFAICT). >> >> Max >> >> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini >> <[email protected]> >> wrote: >> >>> > It may be doable to run this as a different package >>> `airflow-webserver`, an >>> > alternate UI at first, and to eventually rip out the old UI off of >>> > the >>> main >>> > package. >>> >>> This is the same strategy that I was thinking of for AIRFLOW-85. You >>> can build the new UI in parallel, and then delete the old one later. >>> I really think that a REST interface should be a pre-req to any >>> large/new UI changes, though. Getting unified so that everything is >>> driven through REST will be a big win. >>> >>> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin >>> <[email protected]> wrote: >>> > A multi-tenant UI with composable roles on top of granular permissions. >>> > >>> > Migrating from Flask-Admin to Flask App Builder would be an >>> > easy-ish win (since they're both Flask). FAB Provides a good >>> > authentication and permission model that ships out-of-the-box with >>> > a REST api. Suffice to define FAB models (derivative of >>> > SQLAlchemy's model) and you get a set >>> of >>> > perms for the model (can_show, can_list, can_add, can_change, >>> can_delete, >>> > ...) and a set of CRUD REST endpoints. It would also allow us to >>> > rip out the authentication backend code out of Airflow and rely on FAB >>> > for that. >>> > Also every single view gets permissions auto-created for it, and >>> > there >>> are >>> > easy way to define row-level type filters based on user permissions. >>> > >>> > It may be doable to run this as a different package >>> `airflow-webserver`, an >>> > alternate UI at first, and to eventually rip out the old UI off of >>> > the >>> main >>> > package. >>> > >>> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2 >>> > Fflask-appbuilder.readthedocs.io%2Fen%2Flatest%2F&data=01%7C01%7C% >>> > 7C0064f74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea64 >>> > 919%7C1&sdata=8mUPRcf4%2FQUDSbju%2BjLLImalhZeU7tOA%2BFpeO%2BjcEs8% >>> > 3D&reserved=0 >>> > >>> > I'd love to carve some time and lead this. >>> > >>> > On Fri, Nov 18, 2016 at 1:32 PM, Chris Riccomini >>> > <[email protected] >>> > >>> > wrote: >>> > >>> >> Full-fledged REST API (that the UI also uses) would be great in 2.0. >>> >> >>> >> On Fri, Nov 18, 2016 at 6:26 AM, David Kegley <[email protected]> wrote: >>> >> > Hi All, >>> >> > >>> >> > We have been using Airflow heavily for the last couple months >>> >> > and >>> it’s >>> >> been great so far. Here are a few things we’d like to see >>> >> prioritized >>> in >>> >> 2.0. >>> >> > >>> >> > 1) Role based access to DAGs: >>> >> > We would like to see better role based access through the UI. >>> There’s a >>> >> related ticket out there but it hasn’t seen any action in a few >>> >> months >>> >> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2 >>> >> > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-85&data=01%7C01 >>> >> > %7C%7C0064f74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391 >>> >> > feaea64919%7C1&sdata=VsgwHZxr0%2FDQN1jeBTJsfyIGu%2FZkkWhzAvxNvB >>> >> > N531k%3D&reserved=0 >>> >> > >>> >> > We use a templating system to create/deploy DAGs dynamically >>> >> > based on >>> >> some directory/file structure. This allows analysts to quickly >>> >> deploy >>> and >>> >> schedule their ETL code without having to interact with the >>> >> Airflow installation directly. It would be great if those same >>> >> analysts could access to their own DAGs in the UI so that they >>> >> can clear DAG runs, >>> mark >>> >> success, etc. while keeping them away from our core ETL and other >>> >> people's/organization's DAGs. Some of this can be accomplished >>> >> with >>> ‘filter >>> >> by owner’ but it doesn’t address the use case where a DAG can be >>> maintained >>> >> by multiple users in the same organization when they have >>> >> separate >>> Airflow >>> >> user accounts. >>> >> > >>> >> > 2) An option to turn off backfill: >>> >> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2 >>> >> > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-558&data=01%7C0 >>> >> > 1%7C%7C0064f74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b8539 >>> >> > 1feaea64919%7C1&sdata=Xkz7dTkFMEa4np19m4ML1VajVqVPNy%2BVSS5Y%2B >>> >> > Sm8Odk%3D&reserved=0 For cases where a DAG does an insert >>> >> > overwrite on a table every day. >>> >> This might be a realistic option for the current version but I >>> >> just >>> wanted >>> >> to call attention to this feature request. >>> >> > >>> >> > Best, >>> >> > David >>> >> > >>> >> > On Nov 17, 2016, at 6:19 PM, Maxime Beauchemin < >>> >> [email protected]<mailto:[email protected]>> wrote: >>> >> > >>> >> > *This is a brainstorm email thread about Airflow 2.0!* >>> >> > >>> >> > I wanted to share some ideas around what I would like to do in >>> Airflow >>> >> 2.0 >>> >> > and would love to hear what others are thinking. I'll compile >>> >> > the >>> ideas >>> >> > that are shared in this thread in a Wiki once the conversation fades. >>> >> > >>> >> > ------------------------------------------- >>> >> > >>> >> > First idea, to get the conversation started: >>> >> > >>> >> > *Breaking down the package* >>> >> > `pip install airflow-common airflow-scheduler airflow-webserver >>> >> > airflow-operators-googlecloud ...` >>> >> > >>> >> > It seems to me like we're getting to a point where having >>> >> > different repositories and different packages would make things >>> >> > much easier in >>> all >>> >> > sorts of ways. For instance the web server is a lot less >>> >> > sensitive >>> than >>> >> the >>> >> > scheduler, and changes to operators should/could be deployed at >>> >> > will, independently from the main package. People in their >>> >> > environment >>> could >>> >> > upgrade only certain packages when needed. Travis builds would >>> >> > be >>> more >>> >> > targeted, and take less time, ... >>> >> > >>> >> > Also, the whole current "extra_requires" approach to optional >>> >> dependencies >>> >> > (in setup.py) is kind getting out-of-hand. >>> >> > >>> >> > Of course `pip install airflow` would bring in a collection of >>> >> sub-packages >>> >> > similar in functionality to what it does now, perhaps without >>> >> > so many operators you probably don't need in your environment. >>> >> > >>> >> > The release process is the main pain-point and the biggest risk >>> >> > for >>> the >>> >> > project, and I feel like this a solid solution to address it. >>> >> > >>> >> > Max >>> >> > >>> >> >>> >> >>
