I think utilizing namespaces should reduce a lot of problems raised by using separate repos (who will manage it? how to release? where should be the repo?).
Bests, Tomek On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > Thanks Bas for comments! Let me share my thoughts below. > > On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak < > basharens...@godatadriven.com> > wrote: > > > Hi Jarek, I definitely see a future in creating separate installable > > packages for various operators/hooks/etc (as in AIP-8). This would IMO > > strip the “core” Airflow to only what’s needed and result in a small > > package without a ton of dependencies (and make it more maintainable, > > shorter tests, etc etc etc). Not exactly sure though what you’re > proposing > > in your e-mail, is it a new AIP for an intermediate step towards AIP-8? > > > > It's a new AIP I am proposing. For now it's only for backporting the new > 2.0 import paths to 1.10.* series. > > It's more of "incremental going in direction of AIP-8 and learning some > difficulties involved" than implementing AIP-8 fully. We are taking > advantage of changes in import paths from AIP-21 which make it possible to > have both old and new (optional) operators available in 1.10.* series of > Airflow. I think there is a lot more to do for full implementation of > AIP-8: decisions how to maintain, install those operator groups separately, > stewardship model/organisation for the separate groups, how to manage > cross-dependencies, procedures for releasing the packages etc. > > I think about this new AIP also as a learning effort - we would learn more > how separate packaging works and then we can follow up with AIP-8 full > implementation for "modular" Airflow. Then AIP-8 could be implemented in > Airflow 2.1 for example - or 3.0 if we start following semantic versioning > - based on those learnings. It's a bit of good example of having cake and > eating it too. We can try out modularity in 1.10.* while cutting the scope > of 2.0 and not implementing full management/release procedure for AIP-8 > yet. > > > > Thinking about this, I think there are still a few grey areas (which > would > > be good to discuss in a new AIP, or continue on AIP-8): > > > > * In your email you only speak only about the 3 big cloud providers > > (btw I made a PR for migrating all AWS components -> > > https://github.com/apache/airflow/pull/6439). Is there a plan for > > splitting other components than Google/AWS/Azure? > > > > We could add more groups as part of this new AIP indeed (as an extension to > AIP-21 and pre-requisite to AIP-8). We already see how moving/deprecation > works for the providers package - it works for GCP/Google rather nicely. > But there is nothing to prevent us from extending it to cover other groups > of operators/hooks. If you look at the current structure of documentation > done by Kamil, we can follow the structure there and move the > operators/hooks accordingly ( > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html): > > Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft > Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service > integrations, Software integrations, Protocol integrations. > > I am happy to include that in the AIP - if others agree it's a good idea. > Out of those groups - I think only Fundamentals should not be back-ported. > Others should be rather easy to port (if we decide to). We already have > quite a lot of those in the new GCP operators for 2.0. So starting with > GCP/Google group is a good idea. Also following with Cloud Providers first > is a good thing. For example we have now support from Google Composer team > to do this separation for GCP (and we learn from it) and then we can claim > the stewardship in our team for releasing the python 3/ Airflow > 1.10-compatible "airflow-google" packages. Possibly other Cloud > Providers/teams might follow this (if they see the value in it) and there > could be different stewards for those. And then we can do other groups if > we decide to. I think this way we can learn whether AIP-8 is manageable and > what real problems we are going to face. > > * Each “plugin” e.g. GCP would be a separate repo, should we create > > some sort of blueprint for such packages? > > > > I think we do not need separate repos (at all) but in this new AIP we can > test it before we decide to go for AIP-8. IMHO - monorepo approach will > work here rather nicely. We could use python-3 native namespaces > <https://packaging.python.org/guides/packaging-namespace-packages/> for > the > sub-packages when we go full AIP-8. For now we could simply package the new > operators in separate pip package for Python 3 version 1.10.* series only. > We only need to test if it works well with another package providing > 'airflow.providers.*' after apache-airflow is installed (providing > 'airflow' package). But I think we can make it work. I don't think we > really need to split the repos, namespaces will work just fine and has > easier management of cross-repository dependencies (but we can learn > otherwise). For sure we will not need it for the new proposed AIP of > backporting groups to 1.10 and we can defer that decision to AIP-8 > implementation time. > > > > * In which Airflow version do we start raising deprecation warnings > > and in which version would we remove the original? > > > > I think we should do what we did in GCP case already. Those old "imports" > for operators can be made as deprecated in Airflow 2.0 (and removed in 2.1 > or 3.0 if we start following semantic versioning). We can however do it > before in 1.10.7 or 1.10.8 if we release those (without removing the old > operators yet - just raise deprecation warnings and inform that for python3 > the new "airflow-google", "airflow-aws" etc. packages can be installed and > users can switch to it). > > J. > > > > > > Cheers, > > Bas > > > > On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com<mailto: > > jarek.pot...@polidea.com>> wrote: > > > > Hello - any comments on that? I am happy to make it into an AIP :)? > > > > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <jarek.pot...@polidea.com > > <mailto:jarek.pot...@polidea.com>> > > wrote: > > > > *Motivation* > > > > I think we really should start thinking about making it easier to migrate > > to 2.0 for our users. After implementing some recent changes related to > > AIP-21- > > Changes in import paths > > < > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths > > > > I > > think I have an idea that might help with it. > > > > *Proposal* > > > > We could package some of the new and improved 2.0 operators (moved to > > "providers" package) and let them be used in Python 3 environment of > > airflow 1.10.x. > > > > This can be done case-by-case per "cloud provider". It should not be > > obligatory, should be largely driven by each provider. It's not yet full > > AIP-8 > > Split Hooks/Operators into separate packages > > < > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 > >. > > It's > > merely backporting of some operators/hooks to get it work in 1.10. But by > > doing it we might try out the concept of splitting, learn about > maintenance > > problems and maybe implement full *AIP-8 *approach in 2.1 consistently > > across the board. > > > > *Context* > > > > Part of the AIP-21 was to move import paths for Cloud providers to > > separate providers/<PROVIDER> package. An example for that (the first > > provider we already almost migrated) was providers/google package > (further > > divided into gcp/gsuite etc). > > > > We've done a massive migration of all the Google-related operators, > > created a few missing ones and retrofitted some old operators to follow > GCP > > best practices and fixing a number of problems - also implementing > Python3 > > and Pylint compatibility. Some of these operators/hooks are not backwards > > compatible. Those that are compatible are still available via the old > > imports with deprecation warning. > > > > We've added missing tests (including system tests) and missing features - > > improving some of the Google operators - giving the users more > capabilities > > and fixing some issues. Those operators should pretty much "just work" in > > Airflow 1.10.x (any recent version) for Python 3. We should be able to > > release a separate pip-installable package for those operators that users > > should be able to install in Airflow 1.10.x. > > > > Any user will be able to install this separate package in their Airflow > > 1.10.x installation and start using those new "provider" operators in > > parallel to the old 1.10.x operators. Other providers ("microsoft", > > "amazon") might follow the same approach if they want. We could even at > > some point decide to move some of the core operators in similar fashion > > (for example following the structure proposed in the latest > documentation: > > fundamentals / software / etc. > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) > > > > *Pros and cons* > > > > There are a number of pros: > > > > - Users will have an easier migration path if they are deeply vested > > into 1.10.* version > > - It's possible to migrate in stages for people who are also vested in > > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 + > > 2.0* > > - Moving to new operators in py3 + new operators can be done > > gradually. Old operators will continue to work while new can be used > more > > and more > > - People will get incentivised to migrate to python 3 before 2.0 is > > out (by using new operators) > > - Each provider "package" can have independent release schedule - and > > add functionality in already released Airflow versions. > > - We do not take out any functionality from the users - we just add > > more options > > - The releases can be - similarly as main airflow releases - voted > > separately by PMC after "stewards" of the package (per provider) > perform > > round of testing on 1.10.* versions. > > - Users will start migrating to new operators earlier and have > > smoother switch to 2.0 later > > - The latest improved operators will start > > > > There are three cons I could think of: > > > > - There will be quite a lot of duplication between old and new > > operators (they will co-exist in 1.10). That might lead to confusion of > > users and problems with cooperation between different operators/hooks > > - Having new operators in 1.10 python 3 might keep people from > > migrating to 2.0 > > - It will require some maintenance and separate release overhead. > > > > I already spoke to Composer team @Google and they are very positive about > > this. I also spoke to Ash and seems it might also be OK for Astronomer > > team. We have Google's backing and support, and we can provide > maintenance > > and support for those packages - being an example for other providers how > > they can do it. > > > > Let me know what you think - and whether I should make it into an > official > > AIP maybe? > > > > J. > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > -- Tomasz Urbaszek Polidea <https://www.polidea.com/> | Junior Software Engineer M: +48 505 628 493 <+48505628493> E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com> Unique Tech Check out our projects! <https://www.polidea.com/our-work>