I think utilizing namespaces should reduce a lot of problems raised by
using separate repos (who will manage it? how to release? where should be
the repo?).

Bests,
Tomek

On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Thanks Bas for comments! Let me share my thoughts below.
>
> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> basharens...@godatadriven.com>
> wrote:
>
> > Hi Jarek, I definitely see a future in creating separate installable
> > packages for various operators/hooks/etc (as in AIP-8). This would IMO
> > strip the “core” Airflow to only what’s needed and result in a small
> > package without a ton of dependencies (and make it more maintainable,
> > shorter tests, etc etc etc). Not exactly sure though what you’re
> proposing
> > in your e-mail, is it a new AIP for an intermediate step towards AIP-8?
> >
>
> It's a new AIP I am proposing.  For now it's only for backporting the new
> 2.0 import paths to 1.10.* series.
>
> It's more of "incremental going in direction of AIP-8 and learning some
> difficulties involved" than implementing AIP-8 fully. We are taking
> advantage of changes in import paths from AIP-21 which make it possible to
> have both old and new (optional) operators available in 1.10.* series of
> Airflow. I think there is a lot more to do for full implementation of
> AIP-8: decisions how to maintain, install those operator groups separately,
> stewardship model/organisation for the separate groups, how to manage
> cross-dependencies, procedures for releasing the packages etc.
>
> I think about this new AIP also as a learning effort - we would learn more
> how separate packaging works and then we can follow up with AIP-8 full
> implementation for "modular" Airflow. Then AIP-8 could be implemented in
> Airflow 2.1 for example - or 3.0 if we start following semantic versioning
> - based on those learnings. It's a bit of good example of having cake and
> eating it too. We can try out modularity in 1.10.* while cutting the scope
> of 2.0 and not implementing full management/release procedure for AIP-8
> yet.
>
>
> > Thinking about this, I think there are still a few grey areas (which
> would
> > be good to discuss in a new AIP, or continue on AIP-8):
> >
> >   *   In your email you only speak only about the 3 big cloud providers
> > (btw I made a PR for migrating all AWS components ->
> > https://github.com/apache/airflow/pull/6439). Is there a plan for
> > splitting other components than Google/AWS/Azure?
> >
>
> We could add more groups as part of this new AIP indeed (as an extension to
> AIP-21 and pre-requisite to AIP-8). We already see how moving/deprecation
> works for the providers package - it works for GCP/Google rather nicely.
> But there is nothing to prevent us from extending it to cover other groups
> of operators/hooks. If you look at the current structure of documentation
> done by Kamil, we can follow the structure there and move the
> operators/hooks accordingly (
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html):
>
>       Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft
> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service
> integrations, Software integrations, Protocol integrations.
>
> I am happy to include that in the AIP - if others agree it's a good idea.
> Out of those groups -  I think only Fundamentals should not be back-ported.
> Others should be rather easy to port (if we decide to). We already have
> quite a lot of those in the new GCP operators for 2.0. So starting with
> GCP/Google group is a good idea. Also following with Cloud Providers first
> is a good thing. For example we have now support from Google Composer team
> to do this separation for GCP (and we learn from it) and then we can claim
> the stewardship in our team for releasing the python 3/ Airflow
> 1.10-compatible "airflow-google" packages. Possibly other Cloud
> Providers/teams might follow this (if they see the value in it) and there
> could be different stewards for those. And then we can do other groups if
> we decide to. I think this way we can learn whether AIP-8 is manageable and
> what real problems we are going to face.
>
>   *   Each “plugin” e.g. GCP would be a separate repo, should we create
> > some sort of blueprint for such packages?
> >
>
> I think we do not need separate repos (at all) but in this new AIP we can
> test it before we decide to go for AIP-8. IMHO - monorepo approach will
> work here rather nicely. We could use python-3 native namespaces
> <https://packaging.python.org/guides/packaging-namespace-packages/> for
> the
> sub-packages when we go full AIP-8. For now we could simply package the new
> operators in separate pip package for Python 3 version 1.10.* series only.
> We only need to test if it works well with another package providing
> 'airflow.providers.*' after apache-airflow is installed (providing
> 'airflow' package). But I think we can make it work. I don't think we
> really need to split the repos, namespaces will work just fine and has
> easier management of cross-repository dependencies (but we can learn
> otherwise). For sure we will not need it for the new proposed AIP of
> backporting groups to 1.10 and we can defer that decision to AIP-8
> implementation time.
>
>
> >   *   In which Airflow version do we start raising deprecation warnings
> > and in which version would we remove the original?
> >
>
> I think we should do what we did in GCP case already. Those old "imports"
> for operators can be made as deprecated in Airflow 2.0 (and removed in 2.1
> or 3.0 if we start following semantic versioning). We can however do it
> before in 1.10.7 or 1.10.8 if we release those (without removing the old
> operators yet - just raise deprecation warnings and inform that for python3
> the new "airflow-google", "airflow-aws" etc. packages can be installed and
> users can switch to it).
>
> J.
>
>
> >
> > Cheers,
> > Bas
> >
> > On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com<mailto:
> > jarek.pot...@polidea.com>> wrote:
> >
> > Hello - any comments on that? I am happy to make it into an AIP :)?
> >
> > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <jarek.pot...@polidea.com
> > <mailto:jarek.pot...@polidea.com>>
> > wrote:
> >
> > *Motivation*
> >
> > I think we really should start thinking about making it easier to migrate
> > to 2.0 for our users. After implementing some recent changes related to
> > AIP-21-
> > Changes in import paths
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >
> > I
> > think I have an idea that might help with it.
> >
> > *Proposal*
> >
> > We could package some of the new and improved 2.0 operators (moved to
> > "providers" package) and let them be used in Python 3 environment of
> > airflow 1.10.x.
> >
> > This can be done case-by-case per "cloud provider". It should not be
> > obligatory, should be largely driven by each provider. It's not yet full
> > AIP-8
> > Split Hooks/Operators into separate packages
> > <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> >.
> > It's
> > merely backporting of some operators/hooks to get it work in 1.10. But by
> > doing it we might try out the concept of splitting, learn about
> maintenance
> > problems and maybe implement full *AIP-8 *approach in 2.1 consistently
> > across the board.
> >
> > *Context*
> >
> > Part of the AIP-21 was to move import paths for Cloud providers to
> > separate providers/<PROVIDER> package. An example for that (the first
> > provider we already almost migrated) was providers/google package
> (further
> > divided into gcp/gsuite etc).
> >
> > We've done a massive migration of all the Google-related operators,
> > created a few missing ones and retrofitted some old operators to follow
> GCP
> > best practices and fixing a number of problems - also implementing
> Python3
> > and Pylint compatibility. Some of these operators/hooks are not backwards
> > compatible. Those that are compatible are still available via the old
> > imports with deprecation warning.
> >
> > We've added missing tests (including system tests) and missing features -
> > improving some of the Google operators - giving the users more
> capabilities
> > and fixing some issues. Those operators should pretty much "just work" in
> > Airflow 1.10.x (any recent version) for Python 3. We should be able to
> > release a separate pip-installable package for those operators that users
> > should be able to install in Airflow 1.10.x.
> >
> > Any user will be able to install this separate package in their Airflow
> > 1.10.x installation and start using those new "provider" operators in
> > parallel to the old 1.10.x operators. Other providers ("microsoft",
> > "amazon") might follow the same approach if they want. We could even at
> > some point decide to move some of the core operators in similar fashion
> > (for example following the structure proposed in the latest
> documentation:
> > fundamentals / software / etc.
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> >
> > *Pros and cons*
> >
> > There are a number of pros:
> >
> >   - Users will have an easier migration path if they are deeply vested
> >   into 1.10.* version
> >   - It's possible to migrate in stages for people who are also vested in
> >   py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 +
> >   2.0*
> >   - Moving to new operators in py3 + new operators can be done
> >   gradually. Old operators will continue to work while new can be used
> more
> >   and more
> >   - People will get incentivised to migrate to python 3 before 2.0 is
> >   out (by using new operators)
> >   - Each provider "package" can have independent release schedule - and
> >   add functionality in already released Airflow versions.
> >   - We do not take out any functionality from the users - we just add
> >   more options
> >   - The releases can be - similarly as main airflow releases - voted
> >   separately by PMC after "stewards" of the package (per provider)
> perform
> >   round of testing on 1.10.* versions.
> >   - Users will start migrating to new operators earlier and have
> >   smoother switch to 2.0 later
> >   - The latest improved operators will start
> >
> > There are three cons I could think of:
> >
> >   - There will be quite a lot of duplication between old and new
> >   operators (they will co-exist in 1.10). That might lead to confusion of
> >   users and problems with cooperation between different operators/hooks
> >   - Having new operators in 1.10 python 3 might keep people from
> >   migrating to 2.0
> >   - It will require some maintenance and separate release overhead.
> >
> > I already spoke to Composer team @Google and they are very positive about
> > this. I also spoke to Ash and seems it might also be OK for Astronomer
> > team. We have Google's backing and support, and we can provide
> maintenance
> > and support for those packages - being an example for other providers how
> > they can do it.
> >
> > Let me know what you think - and whether I should make it into an
> official
> > AIP maybe?
> >
> > J.
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>


-- 

Tomasz Urbaszek
Polidea <https://www.polidea.com/> | Junior Software Engineer

M: +48 505 628 493 <+48505628493>
E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com>

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Reply via email to