On Wed, Mar 31, 2021 at 02:29:40PM +0200, Vincent Bernat wrote:
>  ? 31 mars 2021 12:46 +02, Willy Tarreau:
> 
> > On the kernel Greg solved all this by issuing all versions very
> > frequently: as long as you produce updates faster than users are
> > willing to deploy them, they can choose what to do. It just requires
> > a bandwidth that we don't have :-/ Some weeks several of us work full
> > time on backports and tests! Right now we've reached a point where
> > backports can prevent us from working on mainline, and where this lack
> > of time increases the risk of regressions, and the regressions require
> > more backport time.
> 
> Wouldn't this mean there are too many versions in parallel?

It cannot be summed up this easily. Normally, old versions are not
released often so they don't cost much. But not releasing them often
complicates the backports and their testing so it's still better to
try to feed them along with the other ones. However, releasing them
in parallel to the other ones makes them more susceptible to get stupid
issues like the last build failure with libmusl. But not releasing them
wouldn't change much given that build failures in certain environments
are only detected once the release sends the signal that it's time to
update :-/

With this said, while the adoption of non-LTS versions has added one
to two versions to the series, it has significantly reduced the pain
of certain backports precisely because it resulted in splitting the
population of users. So at the cost of ~1 more version in the pipe,
we get more detailed reports from users who are more accustomed to
enabling core dumps, firing gdb, applying patches etc, which reduces
the time spent on bugs and increases the confidence in fixes that get
backported. So I'd say that it remains a very good investment. However
I wanted to make sure we shorten the non-LTS versions' life to limit
the in-field fragmentation. And this works extremely well (I'm very
grateful to our users for this, and I suspect that the status banner
in the executable reminding about EOL helps). We probably have not
seen any single 2.1 report in the issues over the last 3-4 months.
And I expect that 6 months after 2.4 is released, we won't read about
2.3 anymore.

Also if you dig into the issue tracker, you'll see a noticeable number
of users who accept to run some tests on 2.3 to verify if it fixes an
issue they face in 2.2. We're usually not asking for an upgrade, just
a test on a very close version. This flexibility is very important as
well.

So the number of parallel versions is one aspect of the problem but
it's also an important part of the solution. I hope we can continue to
maintain short lives for non-LTS but at the same time it must remain a
win-win: if we get useful reports on one version that are valid for
other ones as well, I'm fine with extending it a little bit as we did
for 1.9; there's no reason the ones making most efforts are the first
ones punished.

Overall the real issue remains the number of bugs we introduce in the
code and that is unavoidable when working on lower layers where a good
test coverage is extremely difficult to achieve. Making smaller and more
detailed patches is mandatory. Continuing to add reg-tests definitely
helps a lot. We've added more than one reg-test per week since 2.3,
that's definitely not bad at all, but this effort must continue! The
CI reports few false positives now and the situation has tremendously
improved over the last 2 years. So with better code we can hope for
less bugs, less fixes, less backports hence less risks of regressions.

> > I think that the real problem arrives when a version becomes generally
> > available in distros. And distro users are often the ones with the least
> > autonomy when it comes to rolling back. When you build from sources,
> > you're more at ease. Thus probably that a nice solution would be to
> > add an idle period between a stable release and its appearance in
> > distros so that it really gets some initial deployment before becoming
> > generally available. And I know that some users complain when they do
> > not immediately see their binary package, but that's something we can
> > easily explain and document. We could even indicate a level of confidence
> > in the announce messages. It has the merit of respecting the principle
> > of least surprise for everyone in the chain, including those like you
> > and me involved in the release cycle and who did not necessarily plan
> > to stop all activities to work on yet-another-release because the
> > long-awaited fix-of-the-month broke something and its own fix broke
> > something else.
> 
> We can do that. In the future, I may even tackle all the problems at
> once: providing easy access to old versions and have two versions of
> each repository: one with new versions immediately available and one
> with a semi-fixed delay.

Ah I really like this! Your packages definitely are the most exposed
ones so this could very efficiently reduce the exposure in the early
days and still provide a downgrade path for those who would be the
unlucky ones to first detect a regression. It could also represent an
incentive for users to follow updates more closely, knowing that if
2.2.14 breaks they can roll back to 2.2.13 so that it's better for them
not to leave too large steps between updates.

Thanks!
Willy

Reply via email to