Hi Tim,

On Thu, May 03, 2018 at 03:34:01PM +0200, Tim Düsterhus wrote:
> >> Especially since the issue happens randomly: Sometimes the additional
> >> headers fit by chance. Sometimes they don't. I would start by
> >> investigating the connection to the backend services, not investigating
> >> some random tunables (See my paragraph above.).
> > 
> > Actually when you have an error, the termination flags are the most 
> > important
> > thing to look at as they indicate what was wrong and where/when.
> 
> But still the termination flags do not point me to the *real* issue.
> They are relatively coarse grained.

If they indicate that an overflow occured in the request or the response,
and you have the information for each and every request, you may find that
it's quite useful, especially when you have to match this against side
effects affecting these requests. The fact that they are harder to spot
is a different issue.

> > Just out of curiosity, what do you check more often, in order of priority,
> > among :
> >   - stats page
> >   - show info on CLI
> >   - traffic logs
> >   - admin logs
> >   - other
> > 
> > Because in fact that might help figure where such painful failures would
> > need to be shown (possibly multiple places).
> 
> Primarily munin, because it shows all my services at a glance. Munin
> uses the stats socket.

OK good. This votes in favor of a per-frontend, per-backend counter then
that Munin can check and report when it increases.

> Next would be the syslog [1]. I use the default Debian packaged logging
> set up. I think it places both traffic as well as admin logs into the
> same file. I have `log global` in my default section and no specific
> logs for frontends / backends.
> 
> Last would be the stats page. I use this primarily after reboots to
> verify all my backends are properly UP. It's not much use to me for
> "long term" information, because I unconditionally reload haproxy after
> running certbot renew. Thus my haproxy instance is reloaded once a day.
> Too much hassle to pipe in the new certificates via the admin socket.

OK, pretty clear. So in short by having this per-proxy counter, we can
satisfy users like you (via Munin) and me (via the stats page).

> I don't use any other tools to retrieve information.
> 
> [1] I'd love to have a proper integration with systemd-journald to have
> all my logs in one place. It's pretty annoying, because some things
> ("Proxy bk_*** started"; [WARNING] 121/202559 (11635) : Reexecuting
> Master process) go to systemd-journald (probably printed to stdout /
> stderr) and everything else goes into syslog.

Well, you should probably tell that to the guy who instead of learning
how to use a unix-like system decided it was easier to break everything
in it that used to be pretty simple, stable, reliable and clear for 40
years before he forced his crap into almost every Linux distro to the
point of making them even less observable and debuggable than Windows
nowadays :-(

What you have above looks like stderr. The rest are logs. They are for
very different usages, stderr is there to inform you that something went
wrong during a reload operation (that systemd happily hides so that you
believe it was OK but it was not), while the logs are there for future
traffic analysis and troubleshooting.

And the reason journalctl is this slow very likely lies in its original
purpose which is just to log daemons' output during startup (since it was
confiscated by the tools). It's totally unusable for anything like moderate
to high traffic.

Going back to the initial subject, are you interested in seeing if you
can add a warning counter to each frontend/backend, and possibly a rate
limited warning in the logs as well ? I'm willing to help if needed, it's
just that I really cannot take care of this myself, given that I spent
the last 6 months dealing with bugs and various other discussions, almost
not having been able to start to do anything for the next release :-/ So
any help here is welcome as you can guess.

Thanks!
Willy

Reply via email to