On Wed, Mar 31, 2021 at 10:17:35AM +0200, William Dauchy wrote:
> On Tue, Mar 30, 2021 at 6:59 PM Willy Tarreau <w...@1wt.eu> wrote:
> > HAProxy 2.3.9 was released on 2021/03/30. It added 5 new commits
> > after version 2.3.8.
> >
> > This essentially fixes the rate counters issue that popped up in 2.3.8
> > after the previous fix for the rate counters already.
> >
> > What happened is that the internal time in millisecond wraps every 49.7
> > days and that the new global counter used to make sure rate counters are
> > now stable across threads starts at zero and is initialized when older
> > than the current thread's current date. It just happens that the wrapping
> > happened a few hours ago at "Mon Mar 29 23:59:46 CEST 2021" exactly and
> > that any process started since this date and for the next 24 days doesn't
> > validate this condition anymore, hence doesn't rotate its rate counters
> > anymore.
> 
> Thanks Willy for the quick update. That's a good example to avoid
> pushing stable versions at the same time, so we have opportunities to
> find those regressions.

I know and we're trying to separate them but it considerably increases the
required effort. In addition there is a nasty effect resulting from shifted
releases, which is that it ultimately results in older releases possibly
having more recent fixes than recent ones. And it will happen again with
2.2.12 which I hope to issue today. It will contain the small fix for the
silent-drop issue (which is already in 2.3 of course) but was merged after
2.3.9. The reporter of the issue is on 2.2, it would not be fair to him to
release another 2.2 without it (or we'd fall into a bureaucratic process
that doesn't serve users anymore). So 2.2.12 will contain this fix. But
if the person finally decides to upgrade to 2.3.9 a week or two later, she
may face the bug again. It's not a dramatic one so that's acceptable, but
that shows the difficulties of the process.

In an ideal world, there would be lots of tests in production on stable
versions. The reality is that nobody (me included) is interested in upgrading
prod servers running flawlessly to just confirm there's no nasty surprise
with the forthcoming release, because either there's a bug and you prefer
someone else to spot it first, or there's no problem and you'll upgrade
once the final version is ready.

With this option left off the table, it's clear that the only option that
remains is the shifted versions. But here it would not even have provided
anything because the code worked on monday and broke on tuesday!

What I think we can try to do (and we discussed about this with the other
co-maintainers) is to push the patches but not immediately emit the releases
(so that the backport work is still factored), and to keep the tricky
patches in the -next branch to prevent them from being backported too far
too fast (it will save us from the risk of missing them if not merged).

Overall the most important solution is that we release often enough so
that in case of a regression that affects some users, they can stay on
the previous version a little bit more without having to endure too many
bugs. And if we don't have too many fixes per release, it's easy to emit
yet another small one immediately after to fix a single regression. But
over the last week we've been flooded on multiple channels by many reports
and then it becomes really hard to focus on a single issue at once for a
release :-/

Cheers,
Willy

Reply via email to