Others? WDYT? Shall we start voting on it ? Any more comments? I think I would like to propose an interim solution where all the backported packages for 1.10 will be released as a single big package with Calver Versioning and with some compatibility matrix where we will mark which of the providers were tested (semi-automatically ?) possibly over time automatically using system tests (following the AIP-4 proposal).
Eventually - maybe even for 2.0 - we will be able to split the packages on per-provider basis and release them independently - but that is something that we can test and agree later - when we will be discussing overall release approach (including possibly semantic or calendar versioning for 2.* releases). Let me know if you have any objections, if not, I will call a vote on that in a day or so. J. On Fri, Feb 14, 2020 at 9:46 PM Jarek Potiuk <[email protected]> wrote: > How about going both routes ? > > 1) Provide one big "backport" package for 1.10 > 2) Once we release 2.0 split providers to micro-packages > > J. > > On Fri, Feb 14, 2020 at 9:30 PM Ash Berlin-Taylor <[email protected]> wrote: > >> I think before we take this discussion any further we should work out >> what our plan is for AIP-8 >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303&src=contextnavpagetreemode >> ( though likely needs updating as it still talks about contrib which isn't >> relevant anymore) >> >> AIP-8 talks about "One hook or operator per package, following the "micro >> package" philosophy." as it's long term goal, and I think I broadly agree >> with that. >> Given we have almost all the things in place to have this, I would rather >> we didn't release a single large "backport" package, only to have to have >> users to then switch over to using new packages. >> > We can follow the same process/keys etc as for releasing the main >> airflow >> > package, but I think it can be a bit more relaxed in terms of testing - >> and >> > we can release it more often (as long as there will be new changes in >> > providers). Those packages might be released on "as-is" basis - without >> > guarantee that they work for all operators/hooks/sensors - and without >> > guarantee that they will work for all 1.10.* versions. >> >> I'm in favour of this as a general idea. >> My preferred way is to have each "provider" be it's own package. This is >> a slightly fuzzy concept, as for instance airflow.providers.goolgle >> probably makes sense as a single package rather than a .google.cloud and >> .google.marketing etc packages (as per Kamil's comment on Github), but >> apache.airflow.providers.apache should _not_ be one package. So there's no >> easily expressible rule here, but (to me) there is an obvious way for each >> case. >> Anyway, to provide smalle releases of providers as per terraform, or to >> backport to make 2.0 adoption easier? >> -a >> On Feb 11 2020, at 3:43 pm, Jarek Potiuk <[email protected]> >> wrote: >> > Any more opinions? >> > >> > I gave some thoughts to that and I think we should : >> > 1) release one big providers* package with Calver versioning - >> > apache-airflow-providers-backport-2020.02.11 if were to release it today >> > (we can always break them into smaller packages when we decide in 2.0). >> And >> > then we could change the package names. >> > 2) scheduled or regular releases. We should release them as needed - >> i.e. >> > if we have large change at one or few of the providers or serious >> bugfix, >> > we can release it again. >> > 3) it should be manual effort involving voting and PMC approvals. >> > >> > What do you think? >> > J. >> > >> > On Mon, Feb 10, 2020 at 2:43 PM Tomasz Urbaszek < >> [email protected]> >> > wrote: >> > >> > > I am ok with users building their own packages. >> > > T. >> > > On Mon, Feb 10, 2020 at 1:47 PM Jarek Potiuk < >> [email protected]> >> > > wrote: >> > > >> > > > I think it should be a deliberate effort for releasing - with >> voting. We >> > > > are releasing the source code and IMHO it should follow the same >> rules as >> > > > releasing airflow itself. >> > > > With this change - anyone will be able to build and prepare their >> own >> > > >> > > .whl >> > > > packages and install them locally, so I do not think there is a >> need to >> > > > automatically release those packages? >> > > > >> > > > However releasing them in PyPi should be quite an important event >> as pypi >> > > > releases are supposed to be used by users not developers. >> > > > >> > > > J. >> > > > On Mon, Feb 10, 2020 at 11:16 AM Tomasz Urbaszek < >> > > > [email protected]> wrote: >> > > > >> > > > > I think as long as we follow: >> > > > > > The only people who are supposed to know about such developer >> > > > > >> > > > >> > > >> > > resources >> > > > > are individuals actively participating in development or >> following the >> > > > >> > > > dev >> > > > > list and thus aware of the conditions placed on unreleased >> materials. >> > > > > >> > > > > we should be ok. My impression is that people are usually aware of >> > > > > what "nightly build" means and what are the risks. But it's just a >> > > > > suggestion that I made thinking about all those people who >> contribute >> > > > > integration and can't use it "officialy" for let say the >> following 2 >> > > > > months. I was also thinking about this result >> > > > > >> > > > > >> > > > >> > > >> https://www.digitalocean.com/currents/december-2019/#generational-expectations-for-open-source-maintenance >> > > > > :) >> > > > > >> > > > > T. >> > > > > On Mon, Feb 10, 2020 at 10:52 AM Ash Berlin-Taylor < >> [email protected]> >> > > > wrote: >> > > > > > >> > > > > > That might be a grey area according to my reading of the Apache >> > > release >> > > > > policies: >> > > > > > >> > > > > > https://apache.org/legal/release-policy.html#publication >> > > > > > > During the process of developing software and preparing a >> release, >> > > > > > >> > > > > >> > > > > various packages are made available to the development community >> for >> > > > > testing purposes. Projects MUST direct outsiders towards official >> > > > >> > > > releases >> > > > > rather than raw source repositories, nightly builds, snapshots, >> release >> > > > > candidates, or any other similar packages. The only people who are >> > > > >> > > > supposed >> > > > > to know about such developer resources are individuals actively >> > > > > participating in development or following the dev list and thus >> aware >> > > > >> > > >> > > of >> > > > > the conditions placed on unreleased materials. >> > > > > > On Feb 10 2020, at 9:49 am, Tomasz Urbaszek < >> > > > > >> > > > >> > > > [email protected]> >> > > > > wrote: >> > > > > > > As per the frequency of releases maybe we can consider >> "nightly >> > > > > > > builds" for providers? In this way any contributed >> hook/operator >> > > > > > >> > > > > >> > > > >> > > >> > > will >> > > > > > > be pip-installable in 24h, so users can start to use it = >> test it. >> > > > > > > This can help us reduce the number of releases with unworking >> > > > > > > integrations. >> > > > > > > >> > > > > > > Tomek >> > > > > > > On Mon, Feb 10, 2020 at 12:11 AM Jarek Potiuk < >> > > > > [email protected]> wrote: >> > > > > > > > >> > > > > > > > TL;DR; I wanted to discuss the approach we are going to >> take for >> > > > > backported >> > > > > > > > providers packages. This is important for PMCs to decide >> about >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > how >> > > > > we are >> > > > > > > > going to make release process for it, but I wanted to make >> it >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > public >> > > > > > > > discussion so that anyone else can chime-in and we can >> discuss it >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > as >> > > > > a >> > > > > > > > community. >> > > > > > > > >> > > > > > > > *Context* >> > > > > > > > As explained in the other thread - we are close to have >> > > > > > > >> > > > > > >> > > > > >> > > > > releasable/tested >> > > > > > > > backport packages for Airflow 1.10.* series for "providers" >> > > > > > > > operators/hooks/packages. The main purpose of those backport >> > > > > > > >> > > > > > >> > > > > >> > > > > packages is to >> > > > > > > > let users migrate to the new operators before they migrate >> to >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > 2.0.* >> > > > > version >> > > > > > > > of Airflow. >> > > > > > > > >> > > > > > > > The 2.0 version is still some time in the future, and we >> have a >> > > > > number of >> > > > > > > > operators/hooks/sensors implemented that are not actively >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > used/tests >> > > > > > > > because they are in master version. There are a number of >> changes >> > > > > > > >> > > > > > >> > > > > >> > > > > and fixes >> > > > > > > > only implemented in master/2.0 so it would be great to use >> them >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > in >> > > > > 1.10 - >> > > > > > > > to use the new features but also to test the master >> versions as >> > > > > > > >> > > > > > >> > > > > >> > > > > early as >> > > > > > > > possible. >> > > > > > > > >> > > > > > > > Another great property of the backport packages is that >> they can >> > > be >> > > > > used to >> > > > > > > > ease migration process - users can install the >> > > > > > > >> > > > > > >> > > > > >> > > > > "apache-airflow-providers" >> > > > > > > > package and start using the new operators without migrating >> to a >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > new >> > > > > > > > Airflow. They can incrementally move all their DAGs to use >> the >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > new >> > > > > > > > "providers" package and only after all that is migrated >> they can >> > > > > > > >> > > > > > >> > > > > >> > > > > migrate >> > > > > > > > Airflow to 2.0 when they are ready. That allows to have a >> smooth >> > > > > > > >> > > > > > >> > > > > >> > > > > migration >> > > > > > > > path for those users. >> > > > > > > > >> > > > > > > > *Testing* >> > > > > > > > The issue we have with those packages is that we are not >> 100% >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > sure >> > > > > if the >> > > > > > > > "providers" operators will work with any 1.10.* airflow >> version. >> > > > > > > >> > > > > > >> > > > > >> > > > > There were >> > > > > > > > no fundamental changes and they SHOULD work - but we never >> know >> > > > > > > >> > > > > > >> > > > > >> > > > > until we >> > > > > > > > test. >> > > > > > > > >> > > > > > > > Some preliminary tests with subset of GCP operators show >> that the >> > > > > operators >> > > > > > > > work out-of-the box. We have a big set of "system" tests for >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > "GCP" >> > > > > > > > operators that we will run semi-automatically and make sure >> that >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > all >> > > > > GCP >> > > > > > > > operators are working fine. This is already a great >> compatibility >> > > > > > > >> > > > > > >> > > > > >> > > > > test (GCP >> > > > > > > > operators are about 1/3 of all operators for Airflow). But >> also >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > the >> > > > > > > > approach used in GCP system tests can be applied to other >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > operators. >> > > > > > > > >> > > > > > > > I plan to have a matrix of "compatibilities" in >> > > > > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series >> > > > > > > > and >> > > > > > > > ask community to add/run tests with other packages as well. >> It >> > > > > > > >> > > > > > >> > > > > >> > > > > should be >> > > > > > > > rather easy to add system tests for other systems - >> following the >> > > > > > > >> > > > > > >> > > > > >> > > > > way it is >> > > > > > > > implemented for GCP. >> > > > > > > > >> > > > > > > > *Releases* >> > > > > > > > I think the most important decision is how we are going to >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > release >> > > > > the >> > > > > > > > packages. This is where PMCs have to decide I think as we >> have >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > legal >> > > > > > > > responsibility for releasing Apache Airflow official >> software. >> > > > > > > > >> > > > > > > > What we have now (after the PRs get merged) - wheel and >> source >> > > > > packages >> > > > > > > > build automatically in Travis CI and uploaded to file.io >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > ephemeral >> > > > > storage. >> > > > > > > > The builds upload all the packages there - one big >> "providers" >> > > > > > > >> > > > > > >> > > > > >> > > > > package and >> > > > > > > > separate packages for each "provider". >> > > > > > > > >> > > > > > > > It would be great if we can officially publish packages for >> > > > > backporting in >> > > > > > > > pypi however and here where we have to agree on the >> > > > > > > > process/versioning/cadence. >> > > > > > > > >> > > > > > > > We can follow the same process/keys etc as for releasing >> the main >> > > > > airflow >> > > > > > > > package, but I think it can be a bit more relaxed in terms >> of >> > > > > > > >> > > > > > >> > > > > >> > > > > testing - and >> > > > > > > > we can release it more often (as long as there will be new >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > changes >> > > > in >> > > > > > > > providers). Those packages might be released on "as-is" >> basis - >> > > > > > > >> > > > > > >> > > > > >> > > > > without >> > > > > > > > guarantee that they work for all operators/hooks/sensors - >> and >> > > > > > > >> > > > > > >> > > > > >> > > > > without >> > > > > > > > guarantee that they will work for all 1.10.* versions. We >> can >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > have >> > > > > the >> > > > > > > > "compatibility" statement/matrix in our wiki where people >> who >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > tested >> > > > > some >> > > > > > > > package might simply state that it works for them. At >> Polidea we >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > can >> > > > > assume >> > > > > > > > stewardship on the GCP packages and test them using our >> automated >> > > > > > > >> > > > > > >> > > > > >> > > > > system >> > > > > > > > tests for every release for example - maybe others can >> assume >> > > > > > > > stewardship for other providers. >> > > > > > > > >> > > > > > > > For that - we will need some versioning/release policy. I >> would >> > > say >> > > > > a CalVer >> > > > > > > > <https://calver.org/> approach might work best >> (YYYY.MM.DD). And >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > to >> > > > > make it >> > > > > > > > simple we should release one "big" providers package with >> all >> > > > > > > >> > > > > > >> > > > > >> > > > > providers in. >> > > > > > > > We can have roughly monthly cadence for it. >> > > > > > > > >> > > > > > > > But I am also open to any suggestions here. >> > > > > > > > Please let me know what you think. >> > > > > > > > J. >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > Jarek Potiuk >> > > > > > > > Polidea <https://www.polidea.com/> | Principal Software >> Engineer >> > > > > > > > >> > > > > > > > M: +48 660 796 129 <+48660796129> >> > > > > > > > [image: Polidea] <https://www.polidea.com/> >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Tomasz Urbaszek >> > > > > > > Polidea | Software Engineer >> > > > > > > >> > > > > > > M: +48 505 628 493 >> > > > > > > E: [email protected] >> > > > > > > >> > > > > > > Unique Tech >> > > > > > > Check out our projects! >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Tomasz Urbaszek >> > > > > Polidea | Software Engineer >> > > > > >> > > > > M: +48 505 628 493 >> > > > > E: [email protected] >> > > > > >> > > > > Unique Tech >> > > > > Check out our projects! >> > > > > >> > > > >> > > > >> > > > -- >> > > > Jarek Potiuk >> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > > > >> > > > M: +48 660 796 129 <+48660796129> >> > > > [image: Polidea] <https://www.polidea.com/> >> > > > >> > > >> > > >> > > -- >> > > Tomasz Urbaszek >> > > Polidea <https://www.polidea.com/> | Software Engineer >> > > >> > > M: +48 505 628 493 <+48505628493> >> > > E: [email protected] <[email protected]> >> > > >> > > Unique Tech >> > > Check out our projects! <https://www.polidea.com/our-work> >> > > >> > >> > >> > -- >> > Jarek Potiuk >> > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > >> > M: +48 660 796 129 <+48660796129> >> > [image: Polidea] <https://www.polidea.com/> >> > >> >> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
