On 04/13/2017 02:28 PM, Olivier Houchard wrote:
> On Thu, Apr 13, 2017 at 12:59:38PM +0200, Conrad Hoffmann wrote:
>> On 04/13/2017 11:31 AM, Olivier Houchard wrote:
>>> On Thu, Apr 13, 2017 at 11:17:45AM +0200, Conrad Hoffmann wrote:
>>>> Hi Olivier,
>>>>
>>>> On 04/12/2017 06:09 PM, Olivier Houchard wrote:
>>>>> On Wed, Apr 12, 2017 at 05:50:54PM +0200, Olivier Houchard wrote:
>>>>>> On Wed, Apr 12, 2017 at 05:30:17PM +0200, Conrad Hoffmann wrote:
>>>>>>> Hi again,
>>>>>>>
>>>>>>> so I tried to get this to work, but didn't manage yet. I also don't
>>>>>>> quite
>>>>>>> understand how this is supposed to work. The first haproxy process is
>>>>>>> started _without_ the -x option, is that correct? Where does that
>>>>>>> instance
>>>>>>> ever create the socket for transfer to later instances?
>>>>>>>
>>>>>>> I have it working now insofar that on reload, subsequent instances are
>>>>>>> spawned with the -x option, but they'll just complain that they can't
>>>>>>> get
>>>>>>> anything from the unix socket (because, for all I can tell, it's not
>>>>>>> there?). I also can't see the relevant code path where this socket gets
>>>>>>> created, but I didn't have time to read all of it yet.
>>>>>>>
>>>>>>> Am I doing something wrong? Did anyone get this to work with the
>>>>>>> systemd-wrapper so far?
>>>>>>>
>>>>>>> Also, but this might be a coincidence, my test setup takes a huge
>>>>>>> performance penalty just by applying your patches (without any reloading
>>>>>>> whatsoever). Did this happen to anybody else? I'll send some numbers and
>>>>>>> more details tomorrow.
>>>>>>>
>>>>>>
>>>>>> Ok I can confirm the performance issues, I'm investigating.
>>>>>>
>>>>>
>>>>> Found it, I was messing with SO_LINGER when I shouldn't have been.
>>>>
>>>> <removed code for brevity>
>>>>
>>>> thanks a lot, I can confirm that the performance regression seems to be
>>>> gone!
>>>>
>>>> I am still having the other (conceptual) problem, though. Sorry if this is
>>>> just me holding it wrong or something, it's been a while since I dug
>>>> through the internals of haproxy.
>>>>
>>>> So, as I mentioned before, we use nbproc (12) and the systemd-wrapper,
>>>> which in turn starts haproxy in daemon mode, giving us a process tree like
>>>> this (path and file names shortened for brevity):
>>>>
>>>> \_ /u/s/haproxy-systemd-wrapper -f ./hap.cfg -p /v/r/hap.pid
>>>> \_ /u/s/haproxy-master
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>> \_ /u/s/haproxy -f ./hap.cfg -p /v/r/hap.pid -Ds
>>>>
>>>> Now, in our config file, we have something like this:
>>>>
>>>> # expose admin socket for each process
>>>> stats socket ${STATS_ADDR} level admin process 1
>>>> stats socket ${STATS_ADDR}-2 level admin process 2
>>>> stats socket ${STATS_ADDR}-3 level admin process 3
>>>> stats socket ${STATS_ADDR}-4 level admin process 4
>>>> stats socket ${STATS_ADDR}-5 level admin process 5
>>>> stats socket ${STATS_ADDR}-6 level admin process 6
>>>> stats socket ${STATS_ADDR}-7 level admin process 7
>>>> stats socket ${STATS_ADDR}-8 level admin process 8
>>>> stats socket ${STATS_ADDR}-9 level admin process 9
>>>> stats socket ${STATS_ADDR}-10 level admin process 10
>>>> stats socket ${STATS_ADDR}-11 level admin process 11
>>>> stats socket ${STATS_ADDR}-12 level admin process 12
>>>>
>>>> Basically, we have a dedicate admin socket for each ("real") process, as we
>>>> need to be able to talk to each process individually. So I was wondering:
>>>> which admin socket should I pass as HAPROXY_STATS_SOCKET? I initially
>>>> thought it would have to be a special stats socket in the haproxy-master
>>>> process (which we currently don't have), but as far as I can tell from the
>>>> output of `lsof` the haproxy-master process doesn't even hold any FDs
>>>> anymore. Will this setup currently work with your patches at all? Do I need
>>>> to add a stats socket to the master process? Or would this require a list
>>>> of stats sockets to be passed, similar to the list of PIDs that gets passed
>>>> to new haproxy instances, so that each process can talk to the one from
>>>> which it is taking over the socket(s)? In case I need a stats socket for
>>>> the master process, what would be the directive to create it?
>>>>
>>>
>>> Hi Conrad,
>>>
>>> Any of those sockets will do. Each process are made to keep all the
>>> listening sockets opened, even if the proxy is not bound to that specific
>>> process, justly so that it can be transferred via the unix socket.
>>>
>>> Regards,
>>>
>>> Olivier
>>
>>
>> Thanks, I am finally starting to understand, but I think there still might
>> be a problem. I didn't see that initially, but when I use one of the
>> processes existing admin sockets it still fails, with the following messages:
>>
>> 2017-04-13_10:27:46.95005 [WARNING] 102/102746 (14101) : We didn't get the
>> expected number of sockets (expecting 48 got 37)
>> 2017-04-13_10:27:46.95007 [ALERT] 102/102746 (14101) : Failed to get the
>> sockets from the old process!
>>
>> I have a suspicion about the possible reason. We have a two-tier setup, as
>> is often recommended here on the mailing list: 11 processes do (almost)
>> only SSL termination, then pass to a single process that does most of the
>> heavy lifting. These process use different sockets of course (we use
>> `bind-process 1` and `bind-process 2-X` in frontends). The message above is
>> from the first process, which is the non-SSL one. When using an admin
>> socket from any of the other processes, the message changes to "(expecting
>> 48 got 17)".
>>
>> I assume the patches are incompatible with such a setup at the moment?
>>
>> Thanks once more :)
>> Conrad
>
> Hmm that should not happen, and I can't seem to reproduce it.
> Can you share the haproxy config file you're using ? Are the number of socket
> received always the same ? How are you generating your load ? Is it happening
> on each reload ?
>
> Thanks a lot for going through this, this is really appreciated :)
I am grateful myself you're helping me through this :)
So I removed all the logic and backends from our config file, it's still
quite big and it still works in our environment, which is unfortunately
quite complex. I can also still reliably reproduce the error with this
config. The number seem consistently the same (except for the difference
between the first process and the others).
I am not sure if it makes sense for you to recreate the environment we have
this running in, the variables used in the config file are set to the
following values:
BASE_DIR=/etc/sv/ampelmann
HTTP_PORT=80
HTTPS_PORT=443
HEALTH_PORT=8081
LOCAL_FRONTEND_ADDR=/haproxy-frontend-local.sock
SYSLOG_ADDR=/dev/log
STATS_ADDR=/tmp/haproxy.sock
TEST_ADDR=127.0.0.1:9082
HAPROXY_STATS_SOCKET=/tmp/haproxy.sock
but maybe it is easier to try to reduce the config file even more, maybe
even get rid of the chroot and stuff. I am also happy to compile with debug
statements in certain places or whatever is this would make it easier. I'll
try to spend some more time understanding your code, then maybe I can be of
more help, but not sure when I'll have the time for that given the easter
holidays.
Thanks a lot for looking at this,
Conrad
--
Conrad Hoffmann
Traffic Engineer
SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B
# vim: et ts=2 sw=2 ft=haproxy
global
nbproc 12
maxconn 100000
log ${SYSLOG_ADDR} local1 info
log-tag ampelmann
spread-checks 5
# expose admin socket for each process
stats socket ${STATS_ADDR} level admin process 1
stats socket ${STATS_ADDR}-2 level admin process 2
stats socket ${STATS_ADDR}-3 level admin process 3
stats socket ${STATS_ADDR}-4 level admin process 4
stats socket ${STATS_ADDR}-5 level admin process 5
stats socket ${STATS_ADDR}-6 level admin process 6
stats socket ${STATS_ADDR}-7 level admin process 7
stats socket ${STATS_ADDR}-8 level admin process 8
stats socket ${STATS_ADDR}-9 level admin process 9
stats socket ${STATS_ADDR}-10 level admin process 10
stats socket ${STATS_ADDR}-11 level admin process 11
stats socket ${STATS_ADDR}-12 level admin process 12
user haproxy
group haproxy
chroot ./chroot
defaults
mode http
maxconn 100000
default-server weight 100 inter 30s agent-inter 10s
option dontlognull
option forwardfor except 127.0.0.1 if-none
option redispatch
option abortonclose
timeout http-request 10s
timeout http-keep-alive 120s
timeout queue 10s
timeout connect 5s
timeout client 120s
timeout server 55s
timeout check 5s
http-reuse safe
unique-id-format %{+X}o\ %Ts:%pid:%rc:%rt
errorfile 403 403.http
errorfile 504 504.http
errorfile 503 503.http
errorfile 502 502.http
# The extern proxy performs SSL termination, content compression and common
# request transformations. All requests are then forwarded to the internal
# frontend for logging and routing.
listen public
bind-process 2-32
bind *:${HTTPS_PORT} process 2 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 2
bind *:${HTTPS_PORT} process 3 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 3
bind *:${HTTPS_PORT} process 4 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 4
bind *:${HTTPS_PORT} process 5 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 5
bind *:${HTTPS_PORT} process 6 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 6
bind *:${HTTPS_PORT} process 7 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 7
bind *:${HTTPS_PORT} process 8 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 8
bind *:${HTTPS_PORT} process 9 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 9
bind *:${HTTPS_PORT} process 10 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 10
bind *:${HTTPS_PORT} process 11 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 11
bind *:${HTTPS_PORT} process 12 ssl crt ./wildcard.soundcloud.com.pem crt
./wildcard.sndcdn.com.pem crt ./wildcard.s-cloud.net.pem crt ./exit.sc.pem
ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES128-SHA:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!CAMELLIA
no-sslv3
bind *:${HTTP_PORT} process 12
compression type text/html text/plain text/css text/javascript
application/xml application/json application/x-javascript
application/javascript application/ecmascript application/rss+xml
application/atomsvc+xml application/atom+xml application/msword
application/vnd.ms-excel application/vnd.ms-powerpoint
compression algo gzip
unique-id-header X-Request-Id
server http ${LOCAL_FRONTEND_ADDR} send-proxy
# The internal frontend performs the content based routing and logging.
# The redirects happen here so that they get logged (see TRAF-153).
frontend internal
bind-process 1
bind ${BASE_DIR}/chroot${LOCAL_FRONTEND_ADDR} accept-proxy user haproxy group
haproxy
log global
option httplog
default_backend deny
backend tarpit
timeout tarpit 30s
reqtarpit .
backend deny
http-request deny if { always_true }
backend test
server 1 ${TEST_ADDR}
## monitor frontend
listen monitor
no log
bind *:${HEALTH_PORT}
monitor-uri /health
## expose stats with base port..port+nbproc
listen stats-0
no log
bind *:5000
bind-process 1
stats enable
stats uri /
listen stats-1
no log
bind *:5001
bind-process 2
stats enable
stats uri /
listen stats-2
no log
bind *:5002
bind-process 3
stats enable
stats uri /
listen stats-3
no log
bind *:5003
bind-process 4
stats enable
stats uri /
listen stats-4
no log
bind *:5004
bind-process 5
stats enable
stats uri /
listen stats-5
no log
bind *:5005
bind-process 6
stats enable
stats uri /
listen stats-6
no log
bind *:5006
bind-process 7
stats enable
stats uri /
listen stats-7
no log
bind *:5007
bind-process 8
stats enable
stats uri /
listen stats-8
no log
bind *:5008
bind-process 9
stats enable
stats uri /
listen stats-9
no log
bind *:5009
bind-process 10
stats enable
stats uri /
listen stats-10
no log
bind *:5010
bind-process 11
stats enable
stats uri /
listen stats-11
no log
bind *:5011
bind-process 12
stats enable
stats uri /