Totally agree on all your points Sid. My feeling is that at the moment the most critical thing for the project is to get a release out and get to a steady pace of high quality releases.
Somehow breaking down the package seem to me like it would really help with the release process. Maybe an idea is to make 1.9 a release made of smaller packages where airflow = `(airflow-core + airflow-operators + airflow-webserver)` or something like that. I'm thinking it would allow to release often on airflow-operators & airflow-webserver. Max On Fri, Nov 18, 2016 at 5:34 PM, siddharth anand <[email protected]> wrote: > David > https://issues.apache.org/jira/browse/AIRFLOW-558 (i.e. http > s://github.com/apache/incubator-airflow/pull/1830 ) Is on my plate.. have > already gone through many rounds of reviews, testing, and fixes with the > submitter and does not need to wait till 2.0. We should be able to merge it > soon. BTW, you are encouraged to vote on these PRs so maintainers can > prioritize their time. > > Max, > > Thanks for kicking off this thread. > > Regarding 2.0, we've associated feature deprecation and non-backward > compatible changes with 2.0. Some of this work might be pretty > earth-shaking to Airflow users. IMHO, changes that increase user pain at > upgrade time need to be carefully balanced against value. > > Watching both Gitter and the email list, there are a variety of stumbling > points (for new users) that many of us who have been using the product for > 1-2 years have forgotten. A fair number of people still mention that > getting Airflow up and running is no simple task - i.e. Alex mentioned this > in his talk at the last meet-up. The recent BlueYonder talk referenced > https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls > > Though we may be numerically near 2.0 in terms of release numbers, I'd > prefer to prioritize a few things higher than releasing 2.0. We need to > build and exercise a few necessary muscles : timely PR processing & timely > Apache releases (i.e. quarterly). Beyond that, I'd like to prioritize the > "common pitfall" problems to ease on-boarding. Some of these don't need to > wait for a major release. The ones that do can be developed on a separate > 2.0 branch and baked, reviewed, and voted on by the community before we > consider dropping it into master. > > That way, we can keep master healthy to support the increasing rate of > community-submitted PRs that we are seeing and reduce the cycle time of > cutting stable releases, all while working on big-bang changes for 2.0 > independently. > > Just my $0.02 > -s > > On Fri, Nov 18, 2016 at 3:57 PM, Chris Riccomini <[email protected]> > wrote: > > > > RIP out the charting application and the data profiler > > > > Yes please! +1 > > > > On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin > > <[email protected]> wrote: > > > Another point that may be controversial for Airflow 2.0: RIP out the > > > charting application and the data profiler. Even though it's nice to > have > > > it there, it's just out of scope and has major security > > issues/implications. > > > > > > I'm not sure how popular it actually is. We may need to run a survey at > > > some point around this kind of questions. > > > > > > Max > > > > > > On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin < > > > [email protected]> wrote: > > > > > >> Using FAB's Model, we get pretty much all of that (REST API, > auth/perms, > > >> CRUD) for free: > > >> http://flask-appbuilder.readthedocs.io/en/latest/ > > >> quickhowto.html?highlight=rest#exposed-methods > > >> > > >> I'm pretty intimate with FAB since I use it (and contributed to it) > for > > >> Superset/Caravel. > > >> > > >> All that's needed is to derive FAB's model class instead of > SqlAlchemy's > > >> model class (which FAB's model wraps and adds functionality to and is > > 100% > > >> compatible AFAICT). > > >> > > >> Max > > >> > > >> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini < > [email protected] > > > > > >> wrote: > > >> > > >>> > It may be doable to run this as a different package > > >>> `airflow-webserver`, an > > >>> > alternate UI at first, and to eventually rip out the old UI off of > > the > > >>> main > > >>> > package. > > >>> > > >>> This is the same strategy that I was thinking of for AIRFLOW-85. You > > >>> can build the new UI in parallel, and then delete the old one later. > I > > >>> really think that a REST interface should be a pre-req to any > > >>> large/new UI changes, though. Getting unified so that everything is > > >>> driven through REST will be a big win. > > >>> > > >>> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin > > >>> <[email protected]> wrote: > > >>> > A multi-tenant UI with composable roles on top of granular > > permissions. > > >>> > > > >>> > Migrating from Flask-Admin to Flask App Builder would be an > easy-ish > > win > > >>> > (since they're both Flask). FAB Provides a good authentication and > > >>> > permission model that ships out-of-the-box with a REST api. Suffice > > to > > >>> > define FAB models (derivative of SQLAlchemy's model) and you get a > > set > > >>> of > > >>> > perms for the model (can_show, can_list, can_add, can_change, > > >>> can_delete, > > >>> > ...) and a set of CRUD REST endpoints. It would also allow us to > rip > > out > > >>> > the authentication backend code out of Airflow and rely on FAB for > > that. > > >>> > Also every single view gets permissions auto-created for it, and > > there > > >>> are > > >>> > easy way to define row-level type filters based on user > permissions. > > >>> > > > >>> > It may be doable to run this as a different package > > >>> `airflow-webserver`, an > > >>> > alternate UI at first, and to eventually rip out the old UI off of > > the > > >>> main > > >>> > package. > > >>> > > > >>> > https://flask-appbuilder.readthedocs.io/en/latest/ > > >>> > > > >>> > I'd love to carve some time and lead this. > > >>> > > > >>> > On Fri, Nov 18, 2016 at 1:32 PM, Chris Riccomini < > > [email protected] > > >>> > > > >>> > wrote: > > >>> > > > >>> >> Full-fledged REST API (that the UI also uses) would be great in > 2.0. > > >>> >> > > >>> >> On Fri, Nov 18, 2016 at 6:26 AM, David Kegley <[email protected]> > wrote: > > >>> >> > Hi All, > > >>> >> > > > >>> >> > We have been using Airflow heavily for the last couple months > and > > >>> it’s > > >>> >> been great so far. Here are a few things we’d like to see > > prioritized > > >>> in > > >>> >> 2.0. > > >>> >> > > > >>> >> > 1) Role based access to DAGs: > > >>> >> > We would like to see better role based access through the UI. > > >>> There’s a > > >>> >> related ticket out there but it hasn’t seen any action in a few > > months > > >>> >> > https://issues.apache.org/jira/browse/AIRFLOW-85 > > >>> >> > > > >>> >> > We use a templating system to create/deploy DAGs dynamically > > based on > > >>> >> some directory/file structure. This allows analysts to quickly > > deploy > > >>> and > > >>> >> schedule their ETL code without having to interact with the > Airflow > > >>> >> installation directly. It would be great if those same analysts > > could > > >>> >> access to their own DAGs in the UI so that they can clear DAG > runs, > > >>> mark > > >>> >> success, etc. while keeping them away from our core ETL and other > > >>> >> people's/organization's DAGs. Some of this can be accomplished > with > > >>> ‘filter > > >>> >> by owner’ but it doesn’t address the use case where a DAG can be > > >>> maintained > > >>> >> by multiple users in the same organization when they have separate > > >>> Airflow > > >>> >> user accounts. > > >>> >> > > > >>> >> > 2) An option to turn off backfill: > > >>> >> > https://issues.apache.org/jira/browse/AIRFLOW-558 > > >>> >> > For cases where a DAG does an insert overwrite on a table every > > day. > > >>> >> This might be a realistic option for the current version but I > just > > >>> wanted > > >>> >> to call attention to this feature request. > > >>> >> > > > >>> >> > Best, > > >>> >> > David > > >>> >> > > > >>> >> > On Nov 17, 2016, at 6:19 PM, Maxime Beauchemin < > > >>> >> [email protected]<mailto:[email protected]>> > > wrote: > > >>> >> > > > >>> >> > *This is a brainstorm email thread about Airflow 2.0!* > > >>> >> > > > >>> >> > I wanted to share some ideas around what I would like to do in > > >>> Airflow > > >>> >> 2.0 > > >>> >> > and would love to hear what others are thinking. I'll compile > the > > >>> ideas > > >>> >> > that are shared in this thread in a Wiki once the conversation > > fades. > > >>> >> > > > >>> >> > ------------------------------------------- > > >>> >> > > > >>> >> > First idea, to get the conversation started: > > >>> >> > > > >>> >> > *Breaking down the package* > > >>> >> > `pip install airflow-common airflow-scheduler airflow-webserver > > >>> >> > airflow-operators-googlecloud ...` > > >>> >> > > > >>> >> > It seems to me like we're getting to a point where having > > different > > >>> >> > repositories and different packages would make things much > easier > > in > > >>> all > > >>> >> > sorts of ways. For instance the web server is a lot less > sensitive > > >>> than > > >>> >> the > > >>> >> > scheduler, and changes to operators should/could be deployed at > > will, > > >>> >> > independently from the main package. People in their environment > > >>> could > > >>> >> > upgrade only certain packages when needed. Travis builds would > be > > >>> more > > >>> >> > targeted, and take less time, ... > > >>> >> > > > >>> >> > Also, the whole current "extra_requires" approach to optional > > >>> >> dependencies > > >>> >> > (in setup.py) is kind getting out-of-hand. > > >>> >> > > > >>> >> > Of course `pip install airflow` would bring in a collection of > > >>> >> sub-packages > > >>> >> > similar in functionality to what it does now, perhaps without so > > many > > >>> >> > operators you probably don't need in your environment. > > >>> >> > > > >>> >> > The release process is the main pain-point and the biggest risk > > for > > >>> the > > >>> >> > project, and I feel like this a solid solution to address it. > > >>> >> > > > >>> >> > Max > > >>> >> > > > >>> >> > > >>> > > >> > > >> > > >
