Re: [DISCUSS] Versioning and releases for apache/arrow components

Andrew Lamb Sun, 07 Apr 2024 06:06:30 -0700

I agree with all the other comments on this thread

Having smaller releases is key to being able to release more frequently and
finding the relevant expertise in my opinion.


We have had separate releases / votes for Arrow Rust (and Arrow DataFusion)
and it has served us quite well. The version schemes have diverged
substantially from the monorepo (we are on version 51.0.0 in arrow-rs, for
example) and it doesn't seem to have caused any large confusion with users

Andrew



On Wed, Apr 3, 2024 at 2:11 PM Dewey Dunnington
<de...@voltrondata.com.invalid> wrote:

> Thank you Jacob for bringing this up! I am also in favor of decoupling
> versions (provided that the release managers are also in favor of
> this, since their time is required to implement this and because the
> ongoing consequences of separate releases disproportionately affects
> them).
>
> Part of the vote fatigue is, I think, partly due to the complexity of
> releasing all of the components at the same time. Running the script
> for ADBC, nanoarrow, Rust, and Julia are all fairly straightforward
> because those subprojects have a more limited scope. In contrast, I am
> rarely successful running the Arrow verification script without
> running into an error I don't understand and have become hesitant to
> vote (or try) as a cumulative result of many releases worth of this
> happening (and because R has never been a part of verification, which
> is the component that I unofficially verify anyway). Voting on a batch
> of version numbers seems like a good first step.
>
> I am also not concerned about messaging of different versions of
> different components. The fact that integration tests pass at the
> moment of the release may be meaningful for those familiar with the
> repo, but I don't think that many people are aware of which components
> are tested in that way. As Weston noted, even for components that use
> Arrow C++, the implementation of Arrow C++ features may lag behind or
> be completely unrelated (Python being the exception).
>
> On Fri, Mar 29, 2024 at 9:47 AM Weston Pace <weston.p...@gmail.com> wrote:
> >
> > Thank you for bringing this up.  I'm in favor of this.  I think there are
> > several motivations but the main ones are:
> >
> >  1. Decoupling the versions will allow components to have no release, or
> > only a minor release, when there are no breaking changes
> >  2. We do have some vote fatigue I think and we don't want to make that
> > more difficult.
> >  3. Anything we can do to ease the burden of release managers is good
> >
> > If I understand what you are describing then I think it satisfies points
> 1
> > & 2.  I am not familiar enough with the release management process to
> speak
> > to #3.
> >
> > > Voting in one thread on
> > > all components/a subset of components per voter and the surrounding
> > > technicalities is something I would like to hear some opinions on.
> >
> > I am in favor of decoupling the version numbers.  I do think batched
> > quarterly releases are still a good thing to avoid vote fatigue.  Perhaps
> > we can have a single vote on a batch of version numbers (e.g. please vote
> > on the batched release containing CPP version X, Go version Y, JS version
> > Z).
> >
> > > A more meta question is about the messaging that different versioning
> > > schemes carry, as it might no longer be obvious on first glance which
> > > versions are compatible or have the newest features.
> >
> > I am not concerned about this.  One of the advantages of Arrow is that we
> > have a stable C ABI (C Data Interface) and a stable IPC mechanism (IPC
> > serialization) and this means that version compatibility is rarely a
> > difficulty or major concern.  Plus, regarding individual features, our
> > solution already requires a compatibility table (
> > https://arrow.apache.org/docs/status.html).  Changing the versioning
> > strategy will not make this any worse.
> >
> > On Thu, Mar 28, 2024 at 1:42 PM Jacob Wujciak <assignu...@apache.org>
> wrote:
> >
> > > Hello Everyone!
> > >
> > > I would like to resurface the discussion of separate
> > > versioning/releases/voting for monorepo components. We have previously
> > > touched on this topic mostly in the community meetings and spread
> across
> > > multiple, only tangential related threads. I think a focused
> discussion can
> > > be a bit more results oriented, especially now that we almost regularly
> > > deviate from the quarterly release cadence with minor releases. My
> hope is
> > > that discussing this and adapting our process can lower the amount of
> work
> > > required and ease the pressure on our release managers (Thank you Raúl
> and
> > > Kou!).
> > >
> > > I think the base of the topic is the separate versioning for
> components as
> > > otherwise separate releases only have limited value. From a technical
> > > perspective standalone implementations like Go or JS are the easiest to
> > > handle in that regard, they can just follow their ecosystem standards,
> > > which has been requested by users already (major releases in Go require
> > > manual editing across a code base as dependencies are usually pinned
> to a
> > > major version).
> > >
> > > For Arrow C++ bindings like Arrow R and PyArrow having distinct
> versions
> > > would require additional work to both enable the use of different
> versions
> > > and ensure version compatibility is monitored and potentially updated
> if
> > > needed.
> > >
> > > For Arrow R we have already implemented these changes for different
> reasons
> > > and have backwards compatibility with  libarrow >= 13.0.0. From a user
> > > standpoint of PyArrow this is likely irrelevant as most users get
> binary
> > > wheels from pypi, if a user regularly builds PyArrow from source they
> are
> > > also capable of managing potentially different libarrow version
> > > requirements as this is already necessary to build the package just
> with an
> > > exact version match.
> > >
> > > A more meta question is about the messaging that different versioning
> > > schemes carry, as it might no longer be obvious on first glance which
> > > versions are compatible or have the newest features. Though I would
> argue
> > > that this  a marginal concern at best as there is no guarantee of
> feature
> > > parity between different components with the same version. Breaking
> that
> > > implicit expectation with separate versions could be seen as clearer.
> If a
> > > component only receives dependency bumps or minor bug fixes, releasing
> this
> > > component with a patch version aligns much better with expectations
> than a
> > > major version bump. In addition there are already several differently
> > > versioned libraries in the apache/arrow-* ecosystem that are released
> > > outside of the monorepo release process.  A proper support policy for
> each
> > > component would also be required but could just default to 'current
> major
> > > release' as it is now.
> > >
> > > From an ASF perspective there is no requirement to release the entire
> > > repository at once as the actual release artifact is the source
> tarball. As
> > > long as that is verified and voted on by the PMC it is an official
> release.
> > >
> > > This brings me to the release process and voting. I think it is pretty
> > > clear that completely decoupling all components and their release
> processes
> > > isn't feasible at the moment, mainly from a technical perspective
> > > (crossbow) and would likely also lead to vote fatigue. We have made
> efforts
> > > to ease the verification required for the vote easier and will continue
> > > these efforts. Though I can see some of the components managing their
> own
> > > releases (e.g. R, as we do with post release tasks already due to
> CRAN, ) a
> > > continued quarterly 'batch release' seems like a more appealing
> solution
> > > and would still allow us to use separate versions.  Voting in one
> thread on
> > > all components/a subset of components per voter and the surrounding
> > > technicalities is something I would like to hear some opinions on.
> > >
> > > In my opinion being stricter with release requirements for components
> might
> > > lead to  smaller/less active components not releasing. This seems like
> a
> > > bad thing at first glance but might also spur the user community to get
> > > involved when the reassuring, regular releases dry up and reflect the
> > > reality of the development situation of the component.
> > >
> > > I am eager to hear your thoughts!
> > >
> > > Best
> > > Jacob
> > >
>

Re: [DISCUSS] Versioning and releases for apache/arrow components

Reply via email to