On 6/6/15, Matthew Martin <[email protected]> wrote: > On 6/6/15, Florian Obser <[email protected]> wrote: >> On Sat, Jun 06, 2015 at 07:05:46PM +0000, Florian Obser wrote: >>> On Fri, Jun 05, 2015 at 05:15:09PM -0500, Matthew Martin wrote: >>> > >Synopsis: Serving large files with httpd eats fscktons of memory. >>> > >Category: system >>> > >Environment: >>> > System : OpenBSD 5.7 >>> > Details : OpenBSD 5.7 (GENERIC) #0: Thu Apr 30 22:01:01 CEST 2015 >>> > >>> > [email protected]:/binpatchng/work-binpatch57-amd64/src/sys/arch/amd64/compile/GENERIC >>> > >>> > Architecture: OpenBSD.amd64 >>> > Machine : amd64 >>> > >Description: >>> > A couple of people concurrently downloading iso sized files can drag >>> > the server into using all of swap (there's only a gig) in a >>> > frighteningly short time. >>> > >>> > Revision 1.30 of >>> > http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/httpd/server_file.c >>> > had message "Adjust the read/write watermarks according to the TCP >>> > send buffer. This fixes sending of large files. Previously, httpd was >>> > reading the input file too quickly and could run out of memory when >>> > filling the input buffer." >>> > >>> > That pretty well describes the situation I'm seeing now with 5.7 >>> > (-RELEASE + m:tier binpatches). It also happens with -CURRENT (as of >>> > Jun 5) httpd compiled and slapped in its place. >>> > >>> > >How-To-Repeat: >>> > Something along the lines of >>> > hostA# dd if=/dev/zero of=/var/www/htdocs/test_512.dat bs=1M count=512 >>> > hostB$ ftp http://hostA/test_512.dat >>> > >>> > >Fix: >>> > I've attempted various settings of 'tcp socket buffer' and shotgunning >>> > around the lines modified in 1.30 to no avail. Trying to limit its >>> > consumption via login.conf just makes it die when it hits the limit. >>> >>> the problem is that we have to bufferevents. clt->clt_bev which is >>> handleing the accept(2)ed socket and clt->clt_srvbev which is >>> handleing the fd of the served file. >>> >>> it goes like this: >>> clt_bev: Here I am, brain the size of a planet, and they ask me to serve >>> a file. >>> clt_srvbev: Whee, I can read clt_sndbufsiz bytes from the file. Hello >>> clt_bev, here, have some data. >>> clt_bev: Ok, I can't send it right now, but I have a buffer, I'll put >>> the data at the end. >>> [last message repeated n times] >>> clt_bev: Phew, I can send some data to the client, hopfully clt_bev >>> stops to send me data soon. >>> >>> and so on... In the end the whole iso ends up in clt_bev. >>> >>> I'm not sure yet about the 32 and 256 magic numbers, they seem to it >>> some kind of sweet spot on *my* system. Make them to small and >>> everything slows down, make them to large and you waste memory. >>> >>> Please try this: >>> >> >> this gets rid of a compiler warning: >> >> diff --git server.c server.c >> index ca67a47..8725e2c 100644 >> --- server.c >> +++ server.c >> @@ -704,7 +704,7 @@ server_input(struct client *clt) >> >> /* Adjust write watermark to the socket buffer output size */ >> bufferevent_setwatermark(clt->clt_bev, EV_WRITE, >> - clt->clt_sndbufsiz, 0); >> + 32 * clt->clt_sndbufsiz, 0); >> /* Read at most amount of data that fits in one fcgi record. */ >> bufferevent_setwatermark(clt->clt_bev, EV_READ, 0, FCGI_CONTENT_SIZE); >> >> @@ -729,6 +729,10 @@ server_write(struct bufferevent *bev, void *arg) >> goto done; >> >> bufferevent_enable(bev, EV_READ); >> + >> + if (clt->clt_srvbev && !(clt->clt_srvbev->enabled & EV_READ)) >> + bufferevent_enable(clt->clt_srvbev, EV_READ); >> + >> return; >> done: >> (*bev->errorcb)(bev, EVBUFFER_WRITE|EVBUFFER_EOF, bev->cbarg); >> @@ -769,6 +773,11 @@ server_read(struct bufferevent *bev, void *arg) >> goto fail; >> if (clt->clt_done) >> goto done; >> + >> + if (EVBUFFER_LENGTH(EVBUFFER_OUTPUT(clt->clt_bev)) > (size_t) 256 * >> + clt->clt_sndbufsiz) >> + bufferevent_disable(bev, EV_READ); >> + >> return; >> done: >> (*bev->errorcb)(bev, EVBUFFER_READ|EVBUFFER_EOF, bev->cbarg); >> >> >> -- >> I'm not entirely sure you are real. >> > > > Works great. > $ sudo pkg_delete apache-httpd # :) > > Here's some meaningless benchmarks. Disregard grotesque CPU usage: > It's a virtual machine under virtualbox running on a commodity craptop > with a slow disk trying to read from one VM and write to another. And > everything else because who cares about benchmarks. But as there > doesn't seem to be anything else too horrific I'll throw it on the > live site and maybe get some of the other recent changes stressed a > bit too. If I hit anything else I'll double check on an entirely > -CURRENT system. > > ##### > -RELEASE + binpatch httpd: > ##### > > ### After booting, before doing anything: ### > load averages: 0.57, 0.24, 0.10 test.vm.local 12:53:35 > 35 processes: 34 idle, 1 on processor > 1 CPUs: 1.1% user, 0.0% nice, 1.5% system, 0.1% interrupt, 97.3% idle > Memory: Real: 23M/73M act/tot Free: 165M Cache: 20M Swap: 0K/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 1 root 10 0 516K 524K sleep wait 0:01 0.00% init > 20593 admin 2 0 1168K 2140K sleep kqread 0:00 0.00% tmux > 21635 admin 2 0 3664K 2576K sleep select 0:00 0.00% sshd > 23875 root 2 0 3768K 3260K idle poll 0:00 0.00% sshd > 20006 _syslogd 2 0 1100K 1416K sleep kqread 0:00 0.00% syslogd > 28916 root 18 0 660K 776K sleep pause 0:00 0.00% ksh > 16269 www 2 0 1152K 3316K sleep kqread 0:00 0.00% httpd > 16187 _pflogd 4 0 684K 368K sleep bpf 0:00 0.00% pflogd > 3341 root 2 0 1284K 2344K sleep kqread 0:00 0.00% httpd > 29154 admin 2 0 876K 1856K idle kqread 0:00 0.00% tmux > 5628 _smtpq 2 0 1576K 2188K idle kqread 0:00 0.00% smtpd > 24627 www 2 0 1160K 3284K sleep kqread 0:00 0.00% httpd > 339 www 2 0 1156K 3268K sleep kqread 0:00 0.00% httpd > 28963 admin 18 0 632K 724K idle pause 0:00 0.00% ksh > 22747 _smtpd 2 0 1628K 2192K idle kqread 0:00 0.00% smtpd > 18664 root 2 0 1576K 2140K idle kqread 0:00 0.00% smtpd > 23311 root 3 0 400K 1156K idle ttyin 0:00 0.00% getty > 20789 admin 18 0 628K 740K idle pause 0:00 0.00% ksh > > > > ### About 50% through transferring test_512.dat ### > load averages: 1.76, 0.68, 0.28 test.vm.local 12:56:17 > 35 processes: 1 running, 33 idle, 1 on processor > 1 CPUs: 6.6% user, 0.0% nice, 8.6% system, 4.9% interrupt, 79.8% idle > Memory: Real: 116M/212M act/tot Free: 26M Cache: 21M Swap: 137M/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 16269 www 61 0 257M 114M run - 0:51 67.29% httpd > > > > ### After completion. ### > load averages: 1.62, 0.96, 0.42 test.vm.local 12:57:30 > 35 processes: 1 running, 33 idle, 1 on processor > 1 CPUs: 5.4% user, 0.0% nice, 13.1% system, 7.5% interrupt, 73.9% idle > Memory: Real: 2976K/48M act/tot Free: 190M Cache: 24M Swap: 17M/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 16269 www 2 0 1204K 784K idle kqread 1:20 15.97% httpd > > > > ##### > -RELEASE + binpatch + -CURRENT (Jun 6th) httpd WITHOUT this patch: > ##### > > > > ### Before transfer. ### > load averages: 0.52, 0.16, 0.06 test.vm.local 13:06:32 > 31 processes: 30 idle, 1 on processor > 1 CPUs: 1.8% user, 0.0% nice, 3.0% system, 0.1% interrupt, 95.1% idle > Memory: Real: 20M/67M act/tot Free: 171M Cache: 18M Swap: 0K/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 1 root 10 0 508K 508K idle wait 0:01 0.00% init > 15488 root 2 0 3768K 3256K sleep poll 0:00 0.00% sshd > 13043 _syslogd 2 0 1136K 1456K sleep kqread 0:00 0.00% syslogd > 27239 root 2 0 1280K 2312K idle kqread 0:00 0.00% httpd > 19145 www 2 0 1156K 3184K idle kqread 0:00 0.00% httpd > 12066 www 2 0 1160K 3152K idle kqread 0:00 0.00% httpd > 9620 www 2 0 1152K 3132K idle kqread 0:00 0.00% httpd > 5805 admin 2 0 3648K 2524K sleep select 0:00 0.00% sshd > 15055 admin 18 0 628K 732K sleep pause 0:00 0.00% ksh > 31667 _smtpq 2 0 1580K 2220K idle kqread 0:00 0.00% smtpd > 12 root 3 0 404K 1152K idle ttyin 0:00 0.00% getty > 22421 _pflogd 4 0 684K 360K sleep bpf 0:00 0.00% pflogd > 18060 root 2 0 1580K 2144K idle kqread 0:00 0.00% smtpd > 27368 _smtpd 2 0 1512K 2080K idle kqread 0:00 0.00% smtpd > 2771 root 3 0 396K 1132K idle ttyin 0:00 0.00% getty > 18925 _smtpd 2 0 1432K 2004K idle kqread 0:00 0.00% smtpd > 1105 _smtpd 2 0 1568K 2728K idle kqread 0:00 0.00% smtpd > 30262 root 3 0 400K 1140K idle ttyin 0:00 0.00% getty > > > > ### During transfer. ### > load averages: 1.28, 0.40, 0.15 test.vm.local 13:08:06 > 31 processes: 1 running, 29 idle, 1 on processor > 1 CPUs: 8.5% user, 0.0% nice, 6.6% system, 4.6% interrupt, 80.3% idle > Memory: Real: 157M/238M act/tot Free: 508K Cache: 12M Swap: 64M/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 19145 www 36 0 385M 156M run - 0:25 60.84% httpd > > > > ### During transfer. ### > load averages: 1.65, 0.56, 0.22 test.vm.local 13:08:33 > 31 processes: 1 running, 29 idle, 1 on processor > 1 CPUs: 7.2% user, 0.0% nice, 14.3% system, 8.4% interrupt, 70.1% idle > Memory: Real: 115M/211M act/tot Free: 27M Cache: 19M Swap: 139M/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 19145 www 64 0 257M 114M run - 0:44 65.53% httpd > > > > ### During transfer. ### > load averages: 1.70, 0.63, 0.25 test.vm.local 13:08:54 > 31 processes: 1 running, 29 idle, 1 on processor > 1 CPUs: 6.5% user, 0.0% nice, 18.7% system, 10.4% interrupt, 64.4% idle > Memory: Real: 140M/225M act/tot Free: 13M Cache: 24M Swap: 210M/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 19145 www 57 0 257M 139M run - 0:58 67.48% httpd > > > > ### After transfer. ### > load averages: 0.93, 0.67, 0.29 test.vm.local 13:10:03 > 31 processes: 30 idle, 1 on processor > 1 CPUs: 4.9% user, 0.0% nice, 18.2% system, 10.6% interrupt, 66.3% idle > Memory: Real: 2332K/48M act/tot Free: 190M Cache: 24M Swap: 16M/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 19145 www 2 0 1244K 832K idle kqread 1:13 6.64% httpd > > > > ##### > -RELEASE + binpatch + -CURRENT (Jun 6th) httpd WITH this patch. > ##### > > > > ### Before transfer. ### > load averages: 0.40, 0.10, 0.03 test.vm.local 13:16:22 > 31 processes: 30 idle, 1 on processor > 1 CPUs: 4.9% user, 0.0% nice, 8.6% system, 0.5% interrupt, 86.1% idle > Memory: Real: 20M/67M act/tot Free: 171M Cache: 18M Swap: 0K/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 30213 root 2 0 3772K 3244K sleep poll 0:00 0.15% sshd > 1 root 10 0 520K 524K sleep wait 0:01 0.00% init > 1916 _syslogd 2 0 1096K 1440K sleep kqread 0:00 0.00% syslogd > 21647 root 2 0 1284K 2296K sleep kqread 0:00 0.00% httpd > 23629 www 2 0 1156K 3148K sleep kqread 0:00 0.00% httpd > 717 www 2 0 1164K 3204K sleep kqread 0:00 0.00% httpd > 19611 www 2 0 1156K 3152K sleep kqread 0:00 0.00% httpd > 7000 admin 2 0 3652K 2520K sleep select 0:00 0.00% sshd > 20000 _smtpq 2 0 1572K 2184K sleep kqread 0:00 0.00% smtpd > 21661 root 2 0 1572K 2124K sleep kqread 0:00 0.00% smtpd > 21614 admin 18 0 632K 740K sleep pause 0:00 0.00% ksh > 18344 root 3 0 404K 1152K sleep ttyin 0:00 0.00% getty > 23896 root 3 0 408K 1156K sleep ttyin 0:00 0.00% getty > 25101 root 3 0 404K 1148K sleep ttyin 0:00 0.00% getty > 8370 root 3 0 400K 1144K sleep ttyin 0:00 0.00% getty > 17674 _smtpd 2 0 1624K 2212K sleep kqread 0:00 0.00% smtpd > 21802 root 2 0 632K 616K sleep netio 0:00 0.00% pflogd > 28181 root 3 0 396K 1144K sleep ttyin 0:00 0.00% getty > > > > ### During transfer ### > load averages: 0.37, 0.17, 0.07 test.vm.local 13:17:51 > 31 processes: 1 running, 29 idle, 1 on processor > 1 CPUs: 1.0% user, 0.0% nice, 3.2% system, 1.8% interrupt, 94.1% idle > Memory: Real: 29M/81M act/tot Free: 157M Cache: 24M Swap: 0K/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 717 www 41 0 9444K 11M run - 0:03 11.18% httpd > > > > ### During transfer ### > load averages: 0.44, 0.19, 0.08 test.vm.local 13:18:09 > 31 processes: 1 running, 29 idle, 1 on processor > 1 CPUs: 0.9% user, 0.0% nice, 5.6% system, 5.3% interrupt, 88.3% idle > Memory: Real: 29M/81M act/tot Free: 157M Cache: 24M Swap: 0K/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 717 www 46 0 9444K 11M run - 0:11 28.03% httpd > > > > ### During transfer ### > load averages: 0.70, 0.43, 0.20 test.vm.local 13:21:47 > 31 processes: 30 idle, 1 on processor > 1 CPUs: 0.4% user, 0.0% nice, 8.8% system, 10.3% interrupt, 80.4% idle > Memory: Real: 29M/82M act/tot Free: 156M Cache: 24M Swap: 0K/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 717 www -5 0 9440K 11M sleep getblk 0:39 23.97% httpd > > > > ### During transfer ### > load averages: 0.73, 0.46, 0.22 test.vm.local 13:22:14 > 31 processes: 30 idle, 1 on processor > 1 CPUs: 0.5% user, 0.0% nice, 9.6% system, 11.4% interrupt, 78.5% idle > Memory: Real: 29M/82M act/tot Free: 156M Cache: 24M Swap: 0K/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 717 www 2 0 9440K 11M sleep kqread 0:50 34.33% httpd > > > > ### After transfer ### > load averages: 0.25, 0.38, 0.21 test.vm.local 13:23:40 > 31 processes: 30 idle, 1 on processor > 1 CPUs: 0.4% user, 0.0% nice, 8.2% system, 9.9% interrupt, 81.5% idle > Memory: Real: 21M/74M act/tot Free: 164M Cache: 24M Swap: 0K/1024M > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 717 www 2 0 1248K 3348K idle kqread 0:55 0.44% httpd >
A new twist: TLS is hosed. Just got back to fine out httpd's been gone all day. Found this in /var/log/messages Jun 7 05:06:25 gaobagua httpd[8666]: server_accept_tls: TLS accept failed - (null) Jun 7 05:06:54 gaobagua httpd[8666]: server_accept_tls: TLS accept failed - (null) Jun 7 05:07:24 gaobagua httpd[23587]: server_accept_tls: TLS accept failed - (null) Jun 7 05:07:54 gaobagua httpd[8666]: server_accept_tls: TLS accept failed - (null) Jun 7 05:08:29 gaobagua httpd[8666]: server_accept_tls: TLS accept failed - (null) Jun 7 06:20:42 gaobagua httpd[32078]: server_accept_tls: TLS accept failed - (null) Jun 7 06:22:46 gaobagua httpd[32078]: server_accept_tls: TLS accept failed - (null) Jun 7 07:00:01 gaobagua syslogd: restart Jun 7 08:59:24 gaobagua httpd[720]: server_accept_tls: TLS accept failed - (null) Jun 7 13:36:02 gaobagua httpd[13550]: lost child: server terminated; signal 11 Restarted it with -vvv and http connections seem fine, but anything TLS causes these to come up, slowly, while the browser just spins. server gaobagua.net, client 9 (2 active), 184.20.162.120:63284 -> 192.249.60.124:443, buffer event error server gaobagua.net, client 12 (4 active), 184.20.162.120:63288 -> 192.249.60.124:443, buffer event error server gaobagua.net, client 13 (3 active), 184.20.162.120:63291 -> 192.249.60.124:443, buffer event error server gaobagua.net, client 12 (2 active), 184.20.162.120:63293 -> 192.249.60.124:443, buffer event error server gaobagua.net, client 11 (1 active), 184.20.162.120:63290 -> 192.249.60.124:443, buffer event error server gaobagua.net, client 10 (2 active), 184.20.162.120:63285 -> 192.249.60.124:443, buffer event error server gaobagua.net, client 22 (1 active), 184.20.162.120:63289 -> 192.249.60.124:443, buffer event error server gaobagua.net, client 11 (1 active), 184.20.162.120:63287 -> 192.249.60.124:443, TLS accept timeout server gaobagua.net, client 14 (1 active), 184.20.162.120:63332 -> 192.249.60.124:443, buffer event error ^ 2nd to last is different.
