Willy,

Am 03.05.2018 um 05:23 schrieb Willy Tarreau:
>> To me a message like: "Unable to add-header Content-Security-Policy to
>> response. Possibly the amount of headers exceeds tune.maxrewrite." would
>> have been more helpful than random 502 without any further information.
> 
> We could possibly emit something like this with a warning level, just like
> when a server goes down. However we need to rate-limit it as you don't want
> your admin log to grow at 1000/s when you're flooded with large bogus
> requests that repeatedly cause this to happen.

Yes, I agree.

>> Especially since the issue happens randomly: Sometimes the additional
>> headers fit by chance. Sometimes they don't. I would start by
>> investigating the connection to the backend services, not investigating
>> some random tunables (See my paragraph above.).
> 
> Actually when you have an error, the termination flags are the most important
> thing to look at as they indicate what was wrong and where/when.

But still the termination flags do not point me to the *real* issue.
They are relatively coarse grained.

>> Actually I'm not sure
>> whether a 502 would even be the correct response. The issue is not with
>> the backend, but with the proxy. I'd expect a 500 here:
>>
>>>    The 502 (Bad Gateway) status code indicates that the server, while
>>>    acting as a gateway or proxy, received an *invalid response from an
>>>    inbound server* it accessed while attempting to fulfill the request.
>>
>> (highlighting mine)
> 
> It could as well, but arguably it could also be said that the frontend
> never received a valid response from the backend since this one failed
> to transform a valid response into another valid one.

Depending on the definition of valid. To me the 502 implies looking into
the backend service first, not into haproxy. But let's not bikeshed
about this.

>> After digging into it I might be able to deduce that the addition of the
>> new `http-response add-header` line caused the issues. But still I would
>> be non the wiser. I would have to stumble upon the tunable by accident.
>> Or ask on the list, like I did.
> 
> Just out of curiosity, what do you check more often, in order of priority,
> among :
>   - stats page
>   - show info on CLI
>   - traffic logs
>   - admin logs
>   - other
> 
> Because in fact that might help figure where such painful failures would
> need to be shown (possibly multiple places).

Primarily munin, because it shows all my services at a glance. Munin
uses the stats socket.

Next would be the syslog [1]. I use the default Debian packaged logging
set up. I think it places both traffic as well as admin logs into the
same file. I have `log global` in my default section and no specific
logs for frontends / backends.

Last would be the stats page. I use this primarily after reboots to
verify all my backends are properly UP. It's not much use to me for
"long term" information, because I unconditionally reload haproxy after
running certbot renew. Thus my haproxy instance is reloaded once a day.
Too much hassle to pipe in the new certificates via the admin socket.

I don't use any other tools to retrieve information.

[1] I'd love to have a proper integration with systemd-journald to have
all my logs in one place. It's pretty annoying, because some things
("Proxy bk_*** started"; [WARNING] 121/202559 (11635) : Reexecuting
Master process) go to systemd-journald (probably printed to stdout /
stderr) and everything else goes into syslog.

>> I want to note at this point that I'm not running haproxy at scale or
>> with serious monitoring. The haproxy instance I'm experiencing this
>> issue with is my personal server, not some company or business one. It
>> runs my mail and some side / hobby projects. My needs or expectations
>> might be different.
> 
> That's an important point. It's the same for me on 1wt.eu or haproxy.org,
> sometimes I figure late that there are errors. Most often it's the component
> we forget about because it works and we don't spend time looking at the logs.
> The probably, like me, you're looking at the stats page once in a while, and
> only at the logs when stats look suspicious ?
> 
> We already have "Warnings" columns in the stats page which are unused for
> the frontends, we could use it to report a count of such failures. Or we
> could add an extra "rewrite" column under "warnings" to report such errors
> where they were detected.
> 

As noted above the stats page is useless to me. Most useful to me would
be something munin could detect, because it would send me a mail.

Actually the first thing I would notice is if haproxy died, because then
my mail does not work either. I put haproxy in front of my Dovecot.
But that's a bit drastic I think. :-)

Best regards
Tim Düsterhus

Reply via email to