How about going both routes ?

1) Provide one big "backport" package for 1.10
2) Once we release 2.0 split providers to micro-packages

J.

On Fri, Feb 14, 2020 at 9:30 PM Ash Berlin-Taylor <[email protected]> wrote:

> I think before we take this discussion any further we should work out what
> our plan is for AIP-8
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303&src=contextnavpagetreemode
> ( though likely needs updating as it still talks about contrib which isn't
> relevant anymore)
>
> AIP-8 talks about "One hook or operator per package, following the "micro
> package" philosophy." as it's long term goal, and I think I broadly agree
> with that.
> Given we have almost all the things in place to have this, I would rather
> we didn't release a single large "backport" package, only to have to have
> users to then switch over to using new packages.
> > We can follow the same process/keys etc as for releasing the main airflow
> > package, but I think it can be a bit more relaxed in terms of testing -
> and
> > we can release it more often (as long as there will be new changes in
> > providers). Those packages might be released on "as-is" basis - without
> > guarantee that they work for all operators/hooks/sensors - and without
> > guarantee that they will work for all 1.10.* versions.
>
> I'm in favour of this as a general idea.
> My preferred way is to have each "provider" be it's own package. This is a
> slightly fuzzy concept, as for instance airflow.providers.goolgle probably
> makes sense as a single package rather than a .google.cloud and
> .google.marketing etc packages (as per Kamil's comment on Github), but
> apache.airflow.providers.apache should _not_ be one package. So there's no
> easily expressible rule here, but (to me) there is an obvious way for each
> case.
> Anyway, to provide smalle releases of providers as per terraform, or to
> backport to make 2.0 adoption easier?
> -a
> On Feb 11 2020, at 3:43 pm, Jarek Potiuk <[email protected]> wrote:
> > Any more opinions?
> >
> > I gave some thoughts to that and I think we should :
> > 1) release one big providers* package with Calver versioning -
> > apache-airflow-providers-backport-2020.02.11 if were to release it today
> > (we can always break them into smaller packages when we decide in 2.0).
> And
> > then we could change the package names.
> > 2) scheduled or regular releases. We should release them as needed - i.e.
> > if we have large change at one or few of the providers or serious bugfix,
> > we can release it again.
> > 3) it should be manual effort involving voting and PMC approvals.
> >
> > What do you think?
> > J.
> >
> > On Mon, Feb 10, 2020 at 2:43 PM Tomasz Urbaszek <
> [email protected]>
> > wrote:
> >
> > > I am ok with users building their own packages.
> > > T.
> > > On Mon, Feb 10, 2020 at 1:47 PM Jarek Potiuk <[email protected]
> >
> > > wrote:
> > >
> > > > I think it should be a deliberate effort for releasing - with
> voting. We
> > > > are releasing the source code and IMHO it should follow the same
> rules as
> > > > releasing airflow itself.
> > > > With this change - anyone will be able to build and prepare their own
> > >
> > > .whl
> > > > packages and install them locally, so I do not think there is a need
> to
> > > > automatically release those packages?
> > > >
> > > > However releasing them in PyPi should be quite an important event as
> pypi
> > > > releases are supposed to be used by users not developers.
> > > >
> > > > J.
> > > > On Mon, Feb 10, 2020 at 11:16 AM Tomasz Urbaszek <
> > > > [email protected]> wrote:
> > > >
> > > > > I think as long as we follow:
> > > > > > The only people who are supposed to know about such developer
> > > > >
> > > >
> > >
> > > resources
> > > > > are individuals actively participating in development or following
> the
> > > >
> > > > dev
> > > > > list and thus aware of the conditions placed on unreleased
> materials.
> > > > >
> > > > > we should be ok. My impression is that people are usually aware of
> > > > > what "nightly build" means and what are the risks. But it's just a
> > > > > suggestion that I made thinking about all those people who
> contribute
> > > > > integration and can't use it "officialy" for let say the following
> 2
> > > > > months. I was also thinking about this result
> > > > >
> > > > >
> > > >
> > >
> https://www.digitalocean.com/currents/december-2019/#generational-expectations-for-open-source-maintenance
> > > > > :)
> > > > >
> > > > > T.
> > > > > On Mon, Feb 10, 2020 at 10:52 AM Ash Berlin-Taylor <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > That might be a grey area according to my reading of the Apache
> > > release
> > > > > policies:
> > > > > >
> > > > > > https://apache.org/legal/release-policy.html#publication
> > > > > > > During the process of developing software and preparing a
> release,
> > > > > >
> > > > >
> > > > > various packages are made available to the development community
> for
> > > > > testing purposes. Projects MUST direct outsiders towards official
> > > >
> > > > releases
> > > > > rather than raw source repositories, nightly builds, snapshots,
> release
> > > > > candidates, or any other similar packages. The only people who are
> > > >
> > > > supposed
> > > > > to know about such developer resources are individuals actively
> > > > > participating in development or following the dev list and thus
> aware
> > > >
> > >
> > > of
> > > > > the conditions placed on unreleased materials.
> > > > > > On Feb 10 2020, at 9:49 am, Tomasz Urbaszek <
> > > > >
> > > >
> > > > [email protected]>
> > > > > wrote:
> > > > > > > As per the frequency of releases maybe we can consider "nightly
> > > > > > > builds" for providers? In this way any contributed
> hook/operator
> > > > > >
> > > > >
> > > >
> > >
> > > will
> > > > > > > be pip-installable in 24h, so users can start to use it = test
> it.
> > > > > > > This can help us reduce the number of releases with unworking
> > > > > > > integrations.
> > > > > > >
> > > > > > > Tomek
> > > > > > > On Mon, Feb 10, 2020 at 12:11 AM Jarek Potiuk <
> > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > > TL;DR; I wanted to discuss the approach we are going to take
> for
> > > > > backported
> > > > > > > > providers packages. This is important for PMCs to decide
> about
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > how
> > > > > we are
> > > > > > > > going to make release process for it, but I wanted to make it
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > public
> > > > > > > > discussion so that anyone else can chime-in and we can
> discuss it
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > as
> > > > > a
> > > > > > > > community.
> > > > > > > >
> > > > > > > > *Context*
> > > > > > > > As explained in the other thread - we are close to have
> > > > > > >
> > > > > >
> > > > >
> > > > > releasable/tested
> > > > > > > > backport packages for Airflow 1.10.* series for "providers"
> > > > > > > > operators/hooks/packages. The main purpose of those backport
> > > > > > >
> > > > > >
> > > > >
> > > > > packages is to
> > > > > > > > let users migrate to the new operators before they migrate to
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > 2.0.*
> > > > > version
> > > > > > > > of Airflow.
> > > > > > > >
> > > > > > > > The 2.0 version is still some time in the future, and we
> have a
> > > > > number of
> > > > > > > > operators/hooks/sensors implemented that are not actively
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > used/tests
> > > > > > > > because they are in master version. There are a number of
> changes
> > > > > > >
> > > > > >
> > > > >
> > > > > and fixes
> > > > > > > > only implemented in master/2.0 so it would be great to use
> them
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > in
> > > > > 1.10 -
> > > > > > > > to use the new features but also to test the master versions
> as
> > > > > > >
> > > > > >
> > > > >
> > > > > early as
> > > > > > > > possible.
> > > > > > > >
> > > > > > > > Another great property of the backport packages is that they
> can
> > > be
> > > > > used to
> > > > > > > > ease migration process - users can install the
> > > > > > >
> > > > > >
> > > > >
> > > > > "apache-airflow-providers"
> > > > > > > > package and start using the new operators without migrating
> to a
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > new
> > > > > > > > Airflow. They can incrementally move all their DAGs to use
> the
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > new
> > > > > > > > "providers" package and only after all that is migrated they
> can
> > > > > > >
> > > > > >
> > > > >
> > > > > migrate
> > > > > > > > Airflow to 2.0 when they are ready. That allows to have a
> smooth
> > > > > > >
> > > > > >
> > > > >
> > > > > migration
> > > > > > > > path for those users.
> > > > > > > >
> > > > > > > > *Testing*
> > > > > > > > The issue we have with those packages is that we are not 100%
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > sure
> > > > > if the
> > > > > > > > "providers" operators will work with any 1.10.* airflow
> version.
> > > > > > >
> > > > > >
> > > > >
> > > > > There were
> > > > > > > > no fundamental changes and they SHOULD work - but we never
> know
> > > > > > >
> > > > > >
> > > > >
> > > > > until we
> > > > > > > > test.
> > > > > > > >
> > > > > > > > Some preliminary tests with subset of GCP operators show
> that the
> > > > > operators
> > > > > > > > work out-of-the box. We have a big set of "system" tests for
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > "GCP"
> > > > > > > > operators that we will run semi-automatically and make sure
> that
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > all
> > > > > GCP
> > > > > > > > operators are working fine. This is already a great
> compatibility
> > > > > > >
> > > > > >
> > > > >
> > > > > test (GCP
> > > > > > > > operators are about 1/3 of all operators for Airflow). But
> also
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > the
> > > > > > > > approach used in GCP system tests can be applied to other
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > operators.
> > > > > > > >
> > > > > > > > I plan to have a matrix of "compatibilities" in
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series
> > > > > > > > and
> > > > > > > > ask community to add/run tests with other packages as well.
> It
> > > > > > >
> > > > > >
> > > > >
> > > > > should be
> > > > > > > > rather easy to add system tests for other systems -
> following the
> > > > > > >
> > > > > >
> > > > >
> > > > > way it is
> > > > > > > > implemented for GCP.
> > > > > > > >
> > > > > > > > *Releases*
> > > > > > > > I think the most important decision is how we are going to
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > release
> > > > > the
> > > > > > > > packages. This is where PMCs have to decide I think as we
> have
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > legal
> > > > > > > > responsibility for releasing Apache Airflow official
> software.
> > > > > > > >
> > > > > > > > What we have now (after the PRs get merged) - wheel and
> source
> > > > > packages
> > > > > > > > build automatically in Travis CI and uploaded to file.io
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > ephemeral
> > > > > storage.
> > > > > > > > The builds upload all the packages there - one big
> "providers"
> > > > > > >
> > > > > >
> > > > >
> > > > > package and
> > > > > > > > separate packages for each "provider".
> > > > > > > >
> > > > > > > > It would be great if we can officially publish packages for
> > > > > backporting in
> > > > > > > > pypi however and here where we have to agree on the
> > > > > > > > process/versioning/cadence.
> > > > > > > >
> > > > > > > > We can follow the same process/keys etc as for releasing the
> main
> > > > > airflow
> > > > > > > > package, but I think it can be a bit more relaxed in terms of
> > > > > > >
> > > > > >
> > > > >
> > > > > testing - and
> > > > > > > > we can release it more often (as long as there will be new
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > changes
> > > > in
> > > > > > > > providers). Those packages might be released on "as-is"
> basis -
> > > > > > >
> > > > > >
> > > > >
> > > > > without
> > > > > > > > guarantee that they work for all operators/hooks/sensors -
> and
> > > > > > >
> > > > > >
> > > > >
> > > > > without
> > > > > > > > guarantee that they will work for all 1.10.* versions. We can
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > have
> > > > > the
> > > > > > > > "compatibility" statement/matrix in our wiki where people who
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > tested
> > > > > some
> > > > > > > > package might simply state that it works for them. At
> Polidea we
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > can
> > > > > assume
> > > > > > > > stewardship on the GCP packages and test them using our
> automated
> > > > > > >
> > > > > >
> > > > >
> > > > > system
> > > > > > > > tests for every release for example - maybe others can assume
> > > > > > > > stewardship for other providers.
> > > > > > > >
> > > > > > > > For that - we will need some versioning/release policy. I
> would
> > > say
> > > > > a CalVer
> > > > > > > > <https://calver.org/> approach might work best
> (YYYY.MM.DD). And
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > to
> > > > > make it
> > > > > > > > simple we should release one "big" providers package with all
> > > > > > >
> > > > > >
> > > > >
> > > > > providers in.
> > > > > > > > We can have roughly monthly cadence for it.
> > > > > > > >
> > > > > > > > But I am also open to any suggestions here.
> > > > > > > > Please let me know what you think.
> > > > > > > > J.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Jarek Potiuk
> > > > > > > > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > > > > >
> > > > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Tomasz Urbaszek
> > > > > > > Polidea | Software Engineer
> > > > > > >
> > > > > > > M: +48 505 628 493
> > > > > > > E: [email protected]
> > > > > > >
> > > > > > > Unique Tech
> > > > > > > Check out our projects!
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Tomasz Urbaszek
> > > > > Polidea | Software Engineer
> > > > >
> > > > > M: +48 505 628 493
> > > > > E: [email protected]
> > > > >
> > > > > Unique Tech
> > > > > Check out our projects!
> > > > >
> > > >
> > > >
> > > > --
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> > >
> > > --
> > > Tomasz Urbaszek
> > > Polidea <https://www.polidea.com/> | Software Engineer
> > >
> > > M: +48 505 628 493 <+48505628493>
> > > E: [email protected] <[email protected]>
> > >
> > > Unique Tech
> > > Check out our projects! <https://www.polidea.com/our-work>
> > >
> >
> >
> > --
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to