Hi Yann,
2018-01-21 4:42 GMT-08:00 Yann Ylavic <[email protected]>:
>
> Reusing connections means that we can have ThreadsPerChild sockets per
> backend (per child process).
> Is it really a leak (the more requests, the more sockets, never
> closed), or does it stop at ThreadsPerChild?
>
This was a red herring due to my ignorance about lsof and how it reports
fds opened by process and its threads, I don't believe that there is any
leak of fds.
> My patch simply disables connections recycling/reuse for Unix sockets,
> but it's probably not the right since that's supposed to work.
> Could it be that php-fpm simply does not like connections
> keepalive/reuse? There may be a reason why we disablereuse by default
> in mod_proxy_fcgi...
> Where do communications stop working at 60%, something in the error
> log? Possibly some tcpdump/wireshark between proxy_fcgi and the
> backend could help here too.
>
I did some testing with different httpd/php-fpm configurations and I
noticed something interesting. I removed mod_http2 and used a static/fixed
number of php-fpm workers to simplify the picture, and I was able to
reproduce the issue way before reaching the 60% limit using http/1.1. In
the logs I can clearly see that one httpd process/thread stops logging
after sending the environment variables, and gdb shows this:
Thread 2 (Thread 0x7f99ecf19700 (LWP 21755)):
#0 0x00007f99f60b76ad in poll () at ../sysdeps/unix/syscall-template.S:84
#1 0x00007f99f6e024f4 in apr_poll (aprset=aprset@entry=0x7f99ecf16a00,
num=num@entry=1, nsds=nsds@entry=0x7f99ecf169b4, timeout=<optimized
out>)
at poll/unix/poll.c:120
#2 0x00007f99f0b45893 in dispatch (conn=conn@entry=0x56217d285a60,
conf=conf@entry=0x56217d1c6800, r=r@entry=0x7f99d8007820,
setaside_pool=0x7f99d800d898, err=err@entry=0x7f99ecf18b40,
bad_request=bad_request@entry=0x7f99ecf18ae4,
has_responded=0x7f99ecf18ae8,
request_id=1) at mod_proxy_fcgi.c:563
#3 0x00007f99f0b46e32 in fcgi_do_request (p=<optimized out>, origin=0x0,
uri=<optimized out>, url=<optimized out>, server_portstr=0x7f99ecf18b20
":8443",
conf=0x56217d1c6800, conn=0x56217d285a60, r=0x7f99d8007820) at
mod_proxy_fcgi.c:936
#4 proxy_fcgi_handler (r=0x7f99d8007820, worker=<optimized out>,
conf=<optimized out>,
url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized
out>)
at mod_proxy_fcgi.c:1050
[..]
And all the php-fpm workers this (sadly without debug symbols):
Thread 1 (Thread 0x7fe67a0bcd40 (LWP 21882)):
#0 0x00007fe676b2a1f0 in __read_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1 0x00000000007a9734 in ?? ()
#2 0x00000000007aad60 in fcgi_accept_request ()
#3 0x0000000000464424 in main ()
strace shows the php-fpm process blocked in read(4,
Interesting stats running h2load:
* -n100000 -c10 -m10 https://etc..
With http/2 I can see the h2load reaching 100%, meanwhile with http/1.1 it
doesn't even reach 10% before blocking like described above.
* -n100000 -c100 -m10 https://etc..
Problems showing up with http/2 too. In both case setting Timeout (or
ProxyTimeout) to a low value makes connections to get closed due to
inactivity and h2load slowly proceeds.
pstree of my testing environment:
|-httpd-+-httpd
| `-10*[httpd---21*[{httpd}]]
|-php5-fpm---10*[php5-fpm]
The impression that I have is that somehow mod_proxy_fcgi sends a FCGI
request over a (reused) unix socket's fd and waits polling the fcgi's
backend response, that in the meantime is doing something else. I am going
to do more tests in the following days, but if anybody has a clear idea
about what's happening please let me know :)
Thanks!
Luca