Re: [DISCUSS] Versioning and releases for apache/arrow components

Weston Pace Fri, 29 Mar 2024 05:45:48 -0700

Thank you for bringing this up.  I'm in favor of this.  I think there are
several motivations but the main ones are:


 1. Decoupling the versions will allow components to have no release, or
only a minor release, when there are no breaking changes
 2. We do have some vote fatigue I think and we don't want to make that
more difficult.
 3. Anything we can do to ease the burden of release managers is good

If I understand what you are describing then I think it satisfies points 1
& 2.  I am not familiar enough with the release management process to speak
to #3.

> Voting in one thread on
> all components/a subset of components per voter and the surrounding
> technicalities is something I would like to hear some opinions on.

I am in favor of decoupling the version numbers.  I do think batched
quarterly releases are still a good thing to avoid vote fatigue.  Perhaps
we can have a single vote on a batch of version numbers (e.g. please vote
on the batched release containing CPP version X, Go version Y, JS version
Z).

> A more meta question is about the messaging that different versioning
> schemes carry, as it might no longer be obvious on first glance which
> versions are compatible or have the newest features.

I am not concerned about this.  One of the advantages of Arrow is that we
have a stable C ABI (C Data Interface) and a stable IPC mechanism (IPC
serialization) and this means that version compatibility is rarely a
difficulty or major concern.  Plus, regarding individual features, our
solution already requires a compatibility table (
https://arrow.apache.org/docs/status.html).  Changing the versioning
strategy will not make this any worse.

On Thu, Mar 28, 2024 at 1:42 PM Jacob Wujciak <assignu...@apache.org> wrote:

> Hello Everyone!
>
> I would like to resurface the discussion of separate
> versioning/releases/voting for monorepo components. We have previously
> touched on this topic mostly in the community meetings and spread across
> multiple, only tangential related threads. I think a focused discussion can
> be a bit more results oriented, especially now that we almost regularly
> deviate from the quarterly release cadence with minor releases. My hope is
> that discussing this and adapting our process can lower the amount of work
> required and ease the pressure on our release managers (Thank you Raúl and
> Kou!).
>
> I think the base of the topic is the separate versioning for components as
> otherwise separate releases only have limited value. From a technical
> perspective standalone implementations like Go or JS are the easiest to
> handle in that regard, they can just follow their ecosystem standards,
> which has been requested by users already (major releases in Go require
> manual editing across a code base as dependencies are usually pinned to a
> major version).
>
> For Arrow C++ bindings like Arrow R and PyArrow having distinct versions
> would require additional work to both enable the use of different versions
> and ensure version compatibility is monitored and potentially updated if
> needed.
>
> For Arrow R we have already implemented these changes for different reasons
> and have backwards compatibility with  libarrow >= 13.0.0. From a user
> standpoint of PyArrow this is likely irrelevant as most users get binary
> wheels from pypi, if a user regularly builds PyArrow from source they are
> also capable of managing potentially different libarrow version
> requirements as this is already necessary to build the package just with an
> exact version match.
>
> A more meta question is about the messaging that different versioning
> schemes carry, as it might no longer be obvious on first glance which
> versions are compatible or have the newest features. Though I would argue
> that this  a marginal concern at best as there is no guarantee of feature
> parity between different components with the same version. Breaking that
> implicit expectation with separate versions could be seen as clearer. If a
> component only receives dependency bumps or minor bug fixes, releasing this
> component with a patch version aligns much better with expectations than a
> major version bump. In addition there are already several differently
> versioned libraries in the apache/arrow-* ecosystem that are released
> outside of the monorepo release process.  A proper support policy for each
> component would also be required but could just default to 'current major
> release' as it is now.
>
> From an ASF perspective there is no requirement to release the entire
> repository at once as the actual release artifact is the source tarball. As
> long as that is verified and voted on by the PMC it is an official release.
>
> This brings me to the release process and voting. I think it is pretty
> clear that completely decoupling all components and their release processes
> isn't feasible at the moment, mainly from a technical perspective
> (crossbow) and would likely also lead to vote fatigue. We have made efforts
> to ease the verification required for the vote easier and will continue
> these efforts. Though I can see some of the components managing their own
> releases (e.g. R, as we do with post release tasks already due to CRAN, ) a
> continued quarterly 'batch release' seems like a more appealing solution
> and would still allow us to use separate versions.  Voting in one thread on
> all components/a subset of components per voter and the surrounding
> technicalities is something I would like to hear some opinions on.
>
> In my opinion being stricter with release requirements for components might
> lead to  smaller/less active components not releasing. This seems like a
> bad thing at first glance but might also spur the user community to get
> involved when the reassuring, regular releases dry up and reflect the
> reality of the development situation of the component.
>
> I am eager to hear your thoughts!
>
> Best
> Jacob
>

Re: [DISCUSS] Versioning and releases for apache/arrow components

Reply via email to