How about going both routes ? 1) Provide one big "backport" package for 1.10 2) Once we release 2.0 split providers to micro-packages
J. On Fri, Feb 14, 2020 at 9:30 PM Ash Berlin-Taylor <[email protected]> wrote: > I think before we take this discussion any further we should work out what > our plan is for AIP-8 > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303&src=contextnavpagetreemode > ( though likely needs updating as it still talks about contrib which isn't > relevant anymore) > > AIP-8 talks about "One hook or operator per package, following the "micro > package" philosophy." as it's long term goal, and I think I broadly agree > with that. > Given we have almost all the things in place to have this, I would rather > we didn't release a single large "backport" package, only to have to have > users to then switch over to using new packages. > > We can follow the same process/keys etc as for releasing the main airflow > > package, but I think it can be a bit more relaxed in terms of testing - > and > > we can release it more often (as long as there will be new changes in > > providers). Those packages might be released on "as-is" basis - without > > guarantee that they work for all operators/hooks/sensors - and without > > guarantee that they will work for all 1.10.* versions. > > I'm in favour of this as a general idea. > My preferred way is to have each "provider" be it's own package. This is a > slightly fuzzy concept, as for instance airflow.providers.goolgle probably > makes sense as a single package rather than a .google.cloud and > .google.marketing etc packages (as per Kamil's comment on Github), but > apache.airflow.providers.apache should _not_ be one package. So there's no > easily expressible rule here, but (to me) there is an obvious way for each > case. > Anyway, to provide smalle releases of providers as per terraform, or to > backport to make 2.0 adoption easier? > -a > On Feb 11 2020, at 3:43 pm, Jarek Potiuk <[email protected]> wrote: > > Any more opinions? > > > > I gave some thoughts to that and I think we should : > > 1) release one big providers* package with Calver versioning - > > apache-airflow-providers-backport-2020.02.11 if were to release it today > > (we can always break them into smaller packages when we decide in 2.0). > And > > then we could change the package names. > > 2) scheduled or regular releases. We should release them as needed - i.e. > > if we have large change at one or few of the providers or serious bugfix, > > we can release it again. > > 3) it should be manual effort involving voting and PMC approvals. > > > > What do you think? > > J. > > > > On Mon, Feb 10, 2020 at 2:43 PM Tomasz Urbaszek < > [email protected]> > > wrote: > > > > > I am ok with users building their own packages. > > > T. > > > On Mon, Feb 10, 2020 at 1:47 PM Jarek Potiuk <[email protected] > > > > > wrote: > > > > > > > I think it should be a deliberate effort for releasing - with > voting. We > > > > are releasing the source code and IMHO it should follow the same > rules as > > > > releasing airflow itself. > > > > With this change - anyone will be able to build and prepare their own > > > > > > .whl > > > > packages and install them locally, so I do not think there is a need > to > > > > automatically release those packages? > > > > > > > > However releasing them in PyPi should be quite an important event as > pypi > > > > releases are supposed to be used by users not developers. > > > > > > > > J. > > > > On Mon, Feb 10, 2020 at 11:16 AM Tomasz Urbaszek < > > > > [email protected]> wrote: > > > > > > > > > I think as long as we follow: > > > > > > The only people who are supposed to know about such developer > > > > > > > > > > > > > > > resources > > > > > are individuals actively participating in development or following > the > > > > > > > > dev > > > > > list and thus aware of the conditions placed on unreleased > materials. > > > > > > > > > > we should be ok. My impression is that people are usually aware of > > > > > what "nightly build" means and what are the risks. But it's just a > > > > > suggestion that I made thinking about all those people who > contribute > > > > > integration and can't use it "officialy" for let say the following > 2 > > > > > months. I was also thinking about this result > > > > > > > > > > > > > > > > > > https://www.digitalocean.com/currents/december-2019/#generational-expectations-for-open-source-maintenance > > > > > :) > > > > > > > > > > T. > > > > > On Mon, Feb 10, 2020 at 10:52 AM Ash Berlin-Taylor <[email protected] > > > > > > wrote: > > > > > > > > > > > > That might be a grey area according to my reading of the Apache > > > release > > > > > policies: > > > > > > > > > > > > https://apache.org/legal/release-policy.html#publication > > > > > > > During the process of developing software and preparing a > release, > > > > > > > > > > > > > > > > various packages are made available to the development community > for > > > > > testing purposes. Projects MUST direct outsiders towards official > > > > > > > > releases > > > > > rather than raw source repositories, nightly builds, snapshots, > release > > > > > candidates, or any other similar packages. The only people who are > > > > > > > > supposed > > > > > to know about such developer resources are individuals actively > > > > > participating in development or following the dev list and thus > aware > > > > > > > > > > of > > > > > the conditions placed on unreleased materials. > > > > > > On Feb 10 2020, at 9:49 am, Tomasz Urbaszek < > > > > > > > > > > > > > [email protected]> > > > > > wrote: > > > > > > > As per the frequency of releases maybe we can consider "nightly > > > > > > > builds" for providers? In this way any contributed > hook/operator > > > > > > > > > > > > > > > > > > > > > will > > > > > > > be pip-installable in 24h, so users can start to use it = test > it. > > > > > > > This can help us reduce the number of releases with unworking > > > > > > > integrations. > > > > > > > > > > > > > > Tomek > > > > > > > On Mon, Feb 10, 2020 at 12:11 AM Jarek Potiuk < > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > TL;DR; I wanted to discuss the approach we are going to take > for > > > > > backported > > > > > > > > providers packages. This is important for PMCs to decide > about > > > > > > > > > > > > > > > > > > > > > > > > > > > > how > > > > > we are > > > > > > > > going to make release process for it, but I wanted to make it > > > > > > > > > > > > > > > > > > > > > > > > > > public > > > > > > > > discussion so that anyone else can chime-in and we can > discuss it > > > > > > > > > > > > > > > > > > > > > > > > > > as > > > > > a > > > > > > > > community. > > > > > > > > > > > > > > > > *Context* > > > > > > > > As explained in the other thread - we are close to have > > > > > > > > > > > > > > > > > > > > > > > releasable/tested > > > > > > > > backport packages for Airflow 1.10.* series for "providers" > > > > > > > > operators/hooks/packages. The main purpose of those backport > > > > > > > > > > > > > > > > > > > > > > > packages is to > > > > > > > > let users migrate to the new operators before they migrate to > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2.0.* > > > > > version > > > > > > > > of Airflow. > > > > > > > > > > > > > > > > The 2.0 version is still some time in the future, and we > have a > > > > > number of > > > > > > > > operators/hooks/sensors implemented that are not actively > > > > > > > > > > > > > > > > > > > > > > > > > > used/tests > > > > > > > > because they are in master version. There are a number of > changes > > > > > > > > > > > > > > > > > > > > > > > and fixes > > > > > > > > only implemented in master/2.0 so it would be great to use > them > > > > > > > > > > > > > > > > > > > > > > > > > > > > in > > > > > 1.10 - > > > > > > > > to use the new features but also to test the master versions > as > > > > > > > > > > > > > > > > > > > > > > > early as > > > > > > > > possible. > > > > > > > > > > > > > > > > Another great property of the backport packages is that they > can > > > be > > > > > used to > > > > > > > > ease migration process - users can install the > > > > > > > > > > > > > > > > > > > > > > > "apache-airflow-providers" > > > > > > > > package and start using the new operators without migrating > to a > > > > > > > > > > > > > > > > > > > > > > > > > > new > > > > > > > > Airflow. They can incrementally move all their DAGs to use > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > new > > > > > > > > "providers" package and only after all that is migrated they > can > > > > > > > > > > > > > > > > > > > > > > > migrate > > > > > > > > Airflow to 2.0 when they are ready. That allows to have a > smooth > > > > > > > > > > > > > > > > > > > > > > > migration > > > > > > > > path for those users. > > > > > > > > > > > > > > > > *Testing* > > > > > > > > The issue we have with those packages is that we are not 100% > > > > > > > > > > > > > > > > > > > > > > > > > > > > sure > > > > > if the > > > > > > > > "providers" operators will work with any 1.10.* airflow > version. > > > > > > > > > > > > > > > > > > > > > > > There were > > > > > > > > no fundamental changes and they SHOULD work - but we never > know > > > > > > > > > > > > > > > > > > > > > > > until we > > > > > > > > test. > > > > > > > > > > > > > > > > Some preliminary tests with subset of GCP operators show > that the > > > > > operators > > > > > > > > work out-of-the box. We have a big set of "system" tests for > > > > > > > > > > > > > > > > > > > > > > > > > > > > "GCP" > > > > > > > > operators that we will run semi-automatically and make sure > that > > > > > > > > > > > > > > > > > > > > > > > > > > all > > > > > GCP > > > > > > > > operators are working fine. This is already a great > compatibility > > > > > > > > > > > > > > > > > > > > > > > test (GCP > > > > > > > > operators are about 1/3 of all operators for Airflow). But > also > > > > > > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > approach used in GCP system tests can be applied to other > > > > > > > > > > > > > > > > > > > > > > > > > > operators. > > > > > > > > > > > > > > > > I plan to have a matrix of "compatibilities" in > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series > > > > > > > > and > > > > > > > > ask community to add/run tests with other packages as well. > It > > > > > > > > > > > > > > > > > > > > > > > should be > > > > > > > > rather easy to add system tests for other systems - > following the > > > > > > > > > > > > > > > > > > > > > > > way it is > > > > > > > > implemented for GCP. > > > > > > > > > > > > > > > > *Releases* > > > > > > > > I think the most important decision is how we are going to > > > > > > > > > > > > > > > > > > > > > > > > > > > > release > > > > > the > > > > > > > > packages. This is where PMCs have to decide I think as we > have > > > > > > > > > > > > > > > > > > > > > > > > > > legal > > > > > > > > responsibility for releasing Apache Airflow official > software. > > > > > > > > > > > > > > > > What we have now (after the PRs get merged) - wheel and > source > > > > > packages > > > > > > > > build automatically in Travis CI and uploaded to file.io > > > > > > > > > > > > > > > > > > > > > > > > > > > > ephemeral > > > > > storage. > > > > > > > > The builds upload all the packages there - one big > "providers" > > > > > > > > > > > > > > > > > > > > > > > package and > > > > > > > > separate packages for each "provider". > > > > > > > > > > > > > > > > It would be great if we can officially publish packages for > > > > > backporting in > > > > > > > > pypi however and here where we have to agree on the > > > > > > > > process/versioning/cadence. > > > > > > > > > > > > > > > > We can follow the same process/keys etc as for releasing the > main > > > > > airflow > > > > > > > > package, but I think it can be a bit more relaxed in terms of > > > > > > > > > > > > > > > > > > > > > > > testing - and > > > > > > > > we can release it more often (as long as there will be new > > > > > > > > > > > > > > > > > > > > > > > > > > > > changes > > > > in > > > > > > > > providers). Those packages might be released on "as-is" > basis - > > > > > > > > > > > > > > > > > > > > > > > without > > > > > > > > guarantee that they work for all operators/hooks/sensors - > and > > > > > > > > > > > > > > > > > > > > > > > without > > > > > > > > guarantee that they will work for all 1.10.* versions. We can > > > > > > > > > > > > > > > > > > > > > > > > > > > > have > > > > > the > > > > > > > > "compatibility" statement/matrix in our wiki where people who > > > > > > > > > > > > > > > > > > > > > > > > > > tested > > > > > some > > > > > > > > package might simply state that it works for them. At > Polidea we > > > > > > > > > > > > > > > > > > > > > > > > > > can > > > > > assume > > > > > > > > stewardship on the GCP packages and test them using our > automated > > > > > > > > > > > > > > > > > > > > > > > system > > > > > > > > tests for every release for example - maybe others can assume > > > > > > > > stewardship for other providers. > > > > > > > > > > > > > > > > For that - we will need some versioning/release policy. I > would > > > say > > > > > a CalVer > > > > > > > > <https://calver.org/> approach might work best > (YYYY.MM.DD). And > > > > > > > > > > > > > > > > > > > > > > > > > > to > > > > > make it > > > > > > > > simple we should release one "big" providers package with all > > > > > > > > > > > > > > > > > > > > > > > providers in. > > > > > > > > We can have roughly monthly cadence for it. > > > > > > > > > > > > > > > > But I am also open to any suggestions here. > > > > > > > > Please let me know what you think. > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > > > > > Polidea <https://www.polidea.com/> | Principal Software > Engineer > > > > > > > > > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > > > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Tomasz Urbaszek > > > > > > > Polidea | Software Engineer > > > > > > > > > > > > > > M: +48 505 628 493 > > > > > > > E: [email protected] > > > > > > > > > > > > > > Unique Tech > > > > > > > Check out our projects! > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Tomasz Urbaszek > > > > > Polidea | Software Engineer > > > > > > > > > > M: +48 505 628 493 > > > > > E: [email protected] > > > > > > > > > > Unique Tech > > > > > Check out our projects! > > > > > > > > > > > > > > > > > -- > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > > > > > -- > > > Tomasz Urbaszek > > > Polidea <https://www.polidea.com/> | Software Engineer > > > > > > M: +48 505 628 493 <+48505628493> > > > E: [email protected] <[email protected]> > > > > > > Unique Tech > > > Check out our projects! <https://www.polidea.com/our-work> > > > > > > > > > -- > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
