Thank you Jacob for bringing this up! I am also in favor of decoupling versions (provided that the release managers are also in favor of this, since their time is required to implement this and because the ongoing consequences of separate releases disproportionately affects them).
Part of the vote fatigue is, I think, partly due to the complexity of releasing all of the components at the same time. Running the script for ADBC, nanoarrow, Rust, and Julia are all fairly straightforward because those subprojects have a more limited scope. In contrast, I am rarely successful running the Arrow verification script without running into an error I don't understand and have become hesitant to vote (or try) as a cumulative result of many releases worth of this happening (and because R has never been a part of verification, which is the component that I unofficially verify anyway). Voting on a batch of version numbers seems like a good first step. I am also not concerned about messaging of different versions of different components. The fact that integration tests pass at the moment of the release may be meaningful for those familiar with the repo, but I don't think that many people are aware of which components are tested in that way. As Weston noted, even for components that use Arrow C++, the implementation of Arrow C++ features may lag behind or be completely unrelated (Python being the exception). On Fri, Mar 29, 2024 at 9:47 AM Weston Pace <weston.p...@gmail.com> wrote: > > Thank you for bringing this up. I'm in favor of this. I think there are > several motivations but the main ones are: > > 1. Decoupling the versions will allow components to have no release, or > only a minor release, when there are no breaking changes > 2. We do have some vote fatigue I think and we don't want to make that > more difficult. > 3. Anything we can do to ease the burden of release managers is good > > If I understand what you are describing then I think it satisfies points 1 > & 2. I am not familiar enough with the release management process to speak > to #3. > > > Voting in one thread on > > all components/a subset of components per voter and the surrounding > > technicalities is something I would like to hear some opinions on. > > I am in favor of decoupling the version numbers. I do think batched > quarterly releases are still a good thing to avoid vote fatigue. Perhaps > we can have a single vote on a batch of version numbers (e.g. please vote > on the batched release containing CPP version X, Go version Y, JS version > Z). > > > A more meta question is about the messaging that different versioning > > schemes carry, as it might no longer be obvious on first glance which > > versions are compatible or have the newest features. > > I am not concerned about this. One of the advantages of Arrow is that we > have a stable C ABI (C Data Interface) and a stable IPC mechanism (IPC > serialization) and this means that version compatibility is rarely a > difficulty or major concern. Plus, regarding individual features, our > solution already requires a compatibility table ( > https://arrow.apache.org/docs/status.html). Changing the versioning > strategy will not make this any worse. > > On Thu, Mar 28, 2024 at 1:42 PM Jacob Wujciak <assignu...@apache.org> wrote: > > > Hello Everyone! > > > > I would like to resurface the discussion of separate > > versioning/releases/voting for monorepo components. We have previously > > touched on this topic mostly in the community meetings and spread across > > multiple, only tangential related threads. I think a focused discussion can > > be a bit more results oriented, especially now that we almost regularly > > deviate from the quarterly release cadence with minor releases. My hope is > > that discussing this and adapting our process can lower the amount of work > > required and ease the pressure on our release managers (Thank you Raúl and > > Kou!). > > > > I think the base of the topic is the separate versioning for components as > > otherwise separate releases only have limited value. From a technical > > perspective standalone implementations like Go or JS are the easiest to > > handle in that regard, they can just follow their ecosystem standards, > > which has been requested by users already (major releases in Go require > > manual editing across a code base as dependencies are usually pinned to a > > major version). > > > > For Arrow C++ bindings like Arrow R and PyArrow having distinct versions > > would require additional work to both enable the use of different versions > > and ensure version compatibility is monitored and potentially updated if > > needed. > > > > For Arrow R we have already implemented these changes for different reasons > > and have backwards compatibility with libarrow >= 13.0.0. From a user > > standpoint of PyArrow this is likely irrelevant as most users get binary > > wheels from pypi, if a user regularly builds PyArrow from source they are > > also capable of managing potentially different libarrow version > > requirements as this is already necessary to build the package just with an > > exact version match. > > > > A more meta question is about the messaging that different versioning > > schemes carry, as it might no longer be obvious on first glance which > > versions are compatible or have the newest features. Though I would argue > > that this a marginal concern at best as there is no guarantee of feature > > parity between different components with the same version. Breaking that > > implicit expectation with separate versions could be seen as clearer. If a > > component only receives dependency bumps or minor bug fixes, releasing this > > component with a patch version aligns much better with expectations than a > > major version bump. In addition there are already several differently > > versioned libraries in the apache/arrow-* ecosystem that are released > > outside of the monorepo release process. A proper support policy for each > > component would also be required but could just default to 'current major > > release' as it is now. > > > > From an ASF perspective there is no requirement to release the entire > > repository at once as the actual release artifact is the source tarball. As > > long as that is verified and voted on by the PMC it is an official release. > > > > This brings me to the release process and voting. I think it is pretty > > clear that completely decoupling all components and their release processes > > isn't feasible at the moment, mainly from a technical perspective > > (crossbow) and would likely also lead to vote fatigue. We have made efforts > > to ease the verification required for the vote easier and will continue > > these efforts. Though I can see some of the components managing their own > > releases (e.g. R, as we do with post release tasks already due to CRAN, ) a > > continued quarterly 'batch release' seems like a more appealing solution > > and would still allow us to use separate versions. Voting in one thread on > > all components/a subset of components per voter and the surrounding > > technicalities is something I would like to hear some opinions on. > > > > In my opinion being stricter with release requirements for components might > > lead to smaller/less active components not releasing. This seems like a > > bad thing at first glance but might also spur the user community to get > > involved when the reassuring, regular releases dry up and reflect the > > reality of the development situation of the component. > > > > I am eager to hear your thoughts! > > > > Best > > Jacob > >