On 10/19/22 20:47, Will Jones wrote:
One particular type of defect we might want to consider backporting to
supported versions are ones that silently produce incorrect data. Unlike
ones that cause a crash, it's not easy for a user to know they are affected.
Here are a few examples:
* ARROW-17453: [Go][C++][Parquet] Inconsistent Data with Repetition Levels
[1] (fixed in 10.0.0)
* ARROW-17995: [C++] Fix json decimals not being rescaled based on the
explicit schema [2] (fixed in 10.0.0)
* ARROW-14523: [C++] Fix potential data loss in S3 multipart upload [3]
(fixed in 7.0.0)
Also, I know we have high release costs for new versions, but is that also
true for backporting fixes? Unlike new releases, if we were creating a
bugfix release, we are presumably starting from a much more stable point,
right?
Thanks,
Will Jones
[1] https://issues.apache.org/jira/browse/ARROW-17453
[2] https://issues.apache.org/jira/browse/ARROW-17995
[3] https://issues.apache.org/jira/browse/ARROW-14523
On Wed, Oct 19, 2022 at 9:32 AM Todd Farmer <t...@voltrondata.com.invalid>
wrote:
Hi,
I've been thinking a lot about maintenance and lifecycle policies and
defect classification recently - I'm very grateful this is being raised. I
believe establishing such policies will prove instrumental to enable
adoption of Arrow for a number of use cases that prioritize stability over
innovation.
On Wed, Oct 19, 2022 at 5:42 AM Antoine Pitrou <anto...@python.org> wrote:
Hi Kou,
Le 19/10/2022 à 06:29, Sutou Kouhei a écrit :
My proposal: We maintain the last major release:
* We maintain 9.Y.Z when the latest major release is 9.0.0
* We may release 9.Y.Z when we find a problem such as a
security vulnerability in 9.Y.Z
* We drop support for 9.Y.Z when we release 10.0.0
That sounds ok to me, but is there a more precise criterion than "we
find a problem"?
For most users, backwards compatibility and supported platforms are
likely more important than the version number. If there are many
breaking API changes, this increases the cost of using Arrow, so
supporting easy continuous use of Arrow should be the goal.
In the past, we have from time to time done maintenance releases based
on annoying bugs/regressions. But not always.
I very much agree, and actually think there are multiple questions to
answer here:
1. Which class of defects should be allowed to be merged into a maintenance
branch?
2. Which class of defects must be fixed in a supported maintenance branch?
3. Which class of defects should trigger a maintenance release once a fix
is made to the branch?
4. Which versions should be targeted in backporting a defect fix? How long
will a release receive maintenance support?
5. Which class of defects can be batched into a future maintenance release,
and which need immediate release?
6. What delivery artifacts are needed for maintenance releases? Can some
things be source-only?
Today, any fix may be a candidate for backporting to a maintenance branch
if there's support for doing so in a vote. I believe it might be useful to
more formally triage defects in part to establish policy answering these
questions. For example:
* How severe is the defect? Does it produce wrong results? Cause crashes?
Or is it an annoying spelling error in a log message?
* How widespread is the impact? Is everybody who uses Arrow going to be
affected by this? Or is it only triggered by some very obscure use case?
* How accessible is any workaround?
* How much risk is involved in a fix?
Having a common framework to classify those elements above would enable
policy that clearly defines which defects can (or should, eventually) get
what attention.
If there is interest in the community, I'll continue a draft proposal I'm
working on to formalize triage to capture these aspects. Any such triage
process would be entirely optional for work done against master/main, but
could be required for assessing potential backports as needed.
I'll also note that I recognize Arrow may not currently see a need to
answer all the questions about maintenance/lifecycle policy today, or may
not have the resources needed to deliver what may be desired. It takes a
lot of work to generate a release today. I think it's completely
appropriate to commit only to what can be delivered today, with an eye
towards incremental improvement. For example, an entirely acceptable policy
might be:
* Only the most recently-released minor version is eligible for defect
fixes.
* Security vulnerabilities with CVSS 3.0 score >= 7.0 (High) should trigger
a maintenance release.
* Fixes for defects of any nature may be backported if it reaches
established thresholds (TBD) for severity, widespread impact, workaround
accessibility and risk. Such fixes will be incorporated into the release
maintenance release, made available via source, but no release will be
produced unless triggered by a subsequent security vulnerability fix.
It may be good to disclose known problems on a site associated with the
release. Bug tickets are helpful for work in progress, but wont fix or
cannot fix resolutions associated to a release may be hard to find. As
an example on the current release 10.0.0 and 10.0.1, there are problems
with old Glibc on CentOS7 producing incorrect results for a timestamp
comparison. It is unlikely this will be fixed, but maybe something
users want to be aware of.
I think that we can maintain multiple major releases with
not high release cost by implementing the followings:
* Green nightly CI
* Nightly CI for all maintained branches (maint-X.Y.Z)
* We need to reduce the time taken to CI
* ...
I'm afraid "green nightly CI" is more of an ideal than a reality given
the breadth and complexity of our fleet of CI jobs. We still seem to
have stability problems in some areas (perhaps Acero?) but there are
also regularly regressions due to changes in third-party packages.
Would this still be true if executed against a maintenance release branch?
I understand why this would drift for main/master, but if a version branch
is green when first released, and only accepts limited, qualified
backported fixes, it should be much easier to "keep" green, I'd think.
Thanks,
Todd