Okay so here it is, let it die (after just 71secs this time!) with
modified srces and socket stats:
Relevant message from debug window:
setsockopt: Connection reset by peer
[ALERT] 260/230253 (30302) : frontend_accept(): cannot set the socket
11 in non blocking mode. Giving up
2nd try:
setsockopt: Connection reset by peer
[ALERT] 260/232249 (31233) : frontend_accept(): cannot set the socket
7 in non blocking mode. Giving up
So it is in frontend.c.. (there it is, am a coder after all;))
Socket stats:
Name: HAProxy
Version: 1.5-dev2
Release_date: 2010/08/28
Nbproc: 1
Process_num: 1
Pid: 31233
Uptime: 0d 0h01m25s
Uptime_sec: 85
Memmax_MB: 0
Ulimit-n: 2061
Maxsock: 2061
Maxconn: 1024
Maxpipes: 0
CurrConns: 1
PipesUsed: 0
PipesFree: 0
Tasks: 3
Run_queue: 1
node: kim.mySite.com
description:
#
pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,track
ed,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,
MySite-webfarm,FRONTEND,,,0,10,3000,408,202135,2267714,0,0,12,,,,,OPEN,,,,,,,,,1,1,0,,,,0,0,0,12,,,,0,348,46,13,0,0,,0,12,407,,,
MySite-webfarm,realhost,0,0,0,7,,395,202135,2265170,,0,,0,0,0,0,UP,1,1,0,0,0,85,0,,1,1,1,,395,,2,0,,12,L4OK,,0,0,348,46,1,0,0,0,,,,11,0,
MySite-webfarm,BACKEND,0,0,0,7,3000,395,202135,2267714,0,0,,0,0,0,0,UP,1,1,0,,0,85,0,,1,1,0,,395,,1,0,,12,,,,0,348,46,1,0,0,,,,,11,0,
ease-up,BACKEND,0,0,0,0,0,0,0,0,0,0,,0,0,0,0,UP,0,0,0,,0,85,0,,1,2,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,
0x28217800: proto=unix_stream ts=09 age=0s calls=2
rq[f=c08200h,l=41,an=00h,rx=1d,wx=,ax=]
rp[f=008002h,l=1187,an=00h,rx=,wx=,ax=] s0=[7,8h,fd=7,ex=]
s1=[7,0h,fd=-1,ex=] exp=1d
# table: MySite-webfarm, type: 0, size:1048576, used:93
# table: MySite-webfarm, type: 0, size:1048576, used:93
0x28279160: key=24.22.161.143 use=0 exp=539711 gpc0=0 conn_rate(10000)=0
0x2822ed40: key=38.99.97.70 use=0 exp=563967 gpc0=0 conn_rate(10000)=0
0x2822e890: key=38.101.148.126 use=0 exp=567782 gpc0=0 conn_rate(10000)=0
0x28279c00: key=41.237.139.207 use=0 exp=563638 gpc0=0 conn_rate(10000)=0
0x28279f20: key=62.31.151.187 use=0 exp=571629 gpc0=0 conn_rate(10000)=0
0x28279750: key=64.233.172.18 use=0 exp=551367 gpc0=0 conn_rate(10000)=0
0x282796b0: key=64.236.163.23 use=0 exp=558787 gpc0=0 conn_rate(10000)=0
0x28279430: key=65.52.108.58 use=0 exp=554041 gpc0=0 conn_rate(10000)=0
0x2822e480: key=66.249.65.140 use=0 exp=573178 gpc0=0 conn_rate(10000)=0
0x28279840: key=66.249.65.171 use=0 exp=553594 gpc0=0 conn_rate(10000)=0
0x2822e6b0: key=66.249.65.174 use=0 exp=531842 gpc0=0 conn_rate(10000)=0
0x2822ee30: key=66.249.65.181 use=0 exp=570080 gpc0=0 conn_rate(10000)=0
0x28279d40: key=67.186.46.117 use=0 exp=568763 gpc0=0 conn_rate(10000)=0
0x2822e570: key=67.195.110.151 use=0 exp=567361 gpc0=0 conn_rate(10000)=0
0x28279b60: key=67.195.111.62 use=0 exp=561392 gpc0=0 conn_rate(10000)=0
0x2822eb10: key=67.195.114.235 use=0 exp=561124 gpc0=0 conn_rate(10000)=0
0x2822e200: key=67.195.115.51 use=0 exp=570137 gpc0=0 conn_rate(10000)=0
0x28279de0: key=69.63.210.2 use=0 exp=565285 gpc0=0 conn_rate(10000)=0
0x28279390: key=69.180.33.204 use=0 exp=552748 gpc0=0 conn_rate(10000)=0
0x28279480: key=69.246.203.130 use=0 exp=556538 gpc0=0 conn_rate(10000)=0
0x2822e390: key=70.126.180.46 use=0 exp=515760 gpc0=0 conn_rate(10000)=0
0x2822eed0: key=71.57.189.21 use=0 exp=566366 gpc0=0 conn_rate(10000)=0
0x28279ed0: key=71.196.182.85 use=0 exp=565342 gpc0=0 conn_rate(10000)=0
0x2822e430: key=72.2.133.226 use=0 exp=519530 gpc0=0 conn_rate(10000)=0
0x2822e930: key=72.30.142.215 use=0 exp=519503 gpc0=0 conn_rate(10000)=0
0x28279610: key=72.30.142.248 use=0 exp=562266 gpc0=0 conn_rate(10000)=0
0x2822eca0: key=72.30.161.225 use=0 exp=544068 gpc0=0 conn_rate(10000)=0
0x282797f0: key=74.15.174.110 use=0 exp=550148 gpc0=0 conn_rate(10000)=0
0x282794d0: key=74.47.162.196 use=0 exp=559935 gpc0=0 conn_rate(10000)=0
0x28279c50: key=74.86.147.218 use=0 exp=569162 gpc0=0 conn_rate(10000)=0
0x2822e250: key=77.97.220.144 use=0 exp=518277 gpc0=0 conn_rate(10000)=0
0x28279a20: key=77.98.214.185 use=0 exp=573690 gpc0=0 conn_rate(10000)=0
0x2822e7f0: key=80.195.136.207 use=0 exp=570801 gpc0=0 conn_rate(10000)=0
0x28279b10: key=80.203.57.194 use=0 exp=571481 gpc0=0 conn_rate(10000)=0
0x2822efc0: key=81.109.74.221 use=0 exp=538305 gpc0=0 conn_rate(10000)=0
0x28279520: key=81.222.40.57 use=0 exp=550087 gpc0=0 conn_rate(10000)=0
0x2822e750: key=82.136.37.212 use=0 exp=548981 gpc0=0 conn_rate(10000)=0
0x2822ea70: key=83.36.6.190 use=0 exp=528101 gpc0=0 conn_rate(10000)=0
0x2822eb60: key=85.146.192.111 use=0 exp=527458 gpc0=0 conn_rate(10000)=0
0x2822eac0: key=85.164.63.218 use=0 exp=523925 gpc0=0 conn_rate(10000)=0
0x2822e8e0: key=85.210.231.50 use=0 exp=522232 gpc0=0 conn_rate(10000)=0
0x2822ea20: key=86.7.11.17 use=0 exp=533083 gpc0=0 conn_rate(10000)=0
0x2822ec50: key=86.7.231.64 use=0 exp=542406 gpc0=0 conn_rate(10000)=0
0x28279660: key=86.12.109.228 use=0 exp=571703 gpc0=0 conn_rate(10000)=0
0x2822e980: key=86.44.201.26 use=0 exp=531567 gpc0=0 conn_rate(10000)=0
0x2822e340: key=86.157.17.210 use=0 exp=521835 gpc0=0 conn_rate(10000)=0
0x282798e0: key=88.131.106.2 use=0 exp=558012 gpc0=0 conn_rate(10000)=0
0x28279070: key=88.131.106.3 use=0 exp=547697 gpc0=0 conn_rate(10000)=0
0x2822ebb0: key=88.131.106.7 use=0 exp=568454 gpc0=0 conn_rate(10000)=0
0x2822e610: key=88.131.106.8 use=0 exp=516612 gpc0=0 conn_rate(10000)=0
0x2822e160: key=89.152.119.216 use=0 exp=521049 gpc0=0 conn_rate(10000)=0
0x2822e700: key=89.187.133.45 use=0 exp=525816 gpc0=0 conn_rate(10000)=0
0x2822e110: key=89.204.153.166 use=0 exp=523330 gpc0=0 conn_rate(10000)=0
0x2822e2a0: key=89.243.48.149 use=0 exp=516690 gpc0=0 conn_rate(10000)=0
0x2822e1b0: key=89.243.56.209 use=0 exp=572020 gpc0=0 conn_rate(10000)=0
0x2822e7a0: key=90.192.33.150 use=0 exp=520028 gpc0=0 conn_rate(10000)=0
0x28279930: key=90.212.150.79 use=0 exp=564874 gpc0=0 conn_rate(10000)=0
0x28279700: key=90.219.52.182 use=0 exp=564611 gpc0=0 conn_rate(10000)=0
0x282799d0: key=93.156.227.139 use=0 exp=564205 gpc0=0 conn_rate(10000)=0
0x28279570: key=93.180.244.54 use=0 exp=552745 gpc0=0 conn_rate(10000)=0
0x2822ec00: key=93.186.20.13 use=0 exp=531915 gpc0=0 conn_rate(10000)=0
0x28279a70: key=94.23.220.222 use=0 exp=558942 gpc0=0 conn_rate(10000)=0
0x2822e520: key=94.172.44.59 use=0 exp=516229 gpc0=0 conn_rate(10000)=0
0x28279110: key=94.195.101.38 use=0 exp=554496 gpc0=0 conn_rate(10000)=0
0x2822ef70: key=94.246.126.233 use=0 exp=535674 gpc0=0 conn_rate(10000)=0
0x2822ee80: key=95.150.72.243 use=0 exp=572617 gpc0=0 conn_rate(10000)=0
0x2822e840: key=109.165.199.58 use=0 exp=549732 gpc0=0 conn_rate(10000)=0
0x2822e4d0: key=112.200.219.185 use=0 exp=515703 gpc0=0 conn_rate(10000)=0
0x2822e2f0: key=121.222.240.70 use=0 exp=524070 gpc0=0 conn_rate(10000)=0
0x28279e80: key=128.242.249.11 use=0 exp=565308 gpc0=0 conn_rate(10000)=0
0x28279340: key=166.216.130.47 use=0 exp=548198 gpc0=0 conn_rate(10000)=0
0x28279200: key=173.203.85.247 use=0 exp=539163 gpc0=0 conn_rate(10000)=0
0x2822ede0: key=174.0.37.132 use=0 exp=533528 gpc0=0 conn_rate(10000)=0
0x2822e5c0: key=174.46.170.173 use=0 exp=518059 gpc0=0 conn_rate(10000)=0
0x2822e9d0: key=178.154.160.30 use=0 exp=571521 gpc0=0 conn_rate(10000)=0
0x282791b0: key=188.26.239.203 use=0 exp=542999 gpc0=0 conn_rate(10000)=0
0x2822e3e0: key=189.105.167.105 use=0 exp=548322 gpc0=0 conn_rate(10000)=0
0x2822ed90: key=189.138.145.238 use=0 exp=536739 gpc0=0 conn_rate(10000)=0
0x2822e660: key=190.58.198.101 use=0 exp=517425 gpc0=0 conn_rate(10000)=0
0x282792a0: key=190.88.63.94 use=0 exp=550882 gpc0=0 conn_rate(10000)=0
0x2822ecf0: key=196.15.208.203 use=0 exp=529399 gpc0=0 conn_rate(10000)=0
0x28279e30: key=197.0.111.78 use=0 exp=573453 gpc0=0 conn_rate(10000)=0
0x282795c0: key=201.224.33.237 use=0 exp=573124 gpc0=0 conn_rate(10000)=0
0x282797a0: key=206.116.229.102 use=0 exp=549525 gpc0=0 conn_rate(10000)=0
0x282790c0: key=207.46.12.91 use=0 exp=539057 gpc0=0 conn_rate(10000)=0
0x28279250: key=207.46.199.51 use=0 exp=569228 gpc0=0 conn_rate(10000)=0
0x28279980: key=207.46.199.54 use=0 exp=555208 gpc0=0 conn_rate(10000)=0
0x282792f0: key=207.46.204.243 use=0 exp=540350 gpc0=0 conn_rate(10000)=0
0x28279bb0: key=209.85.228.85 use=0 exp=559738 gpc0=0 conn_rate(10000)=0
0x28279890: key=210.193.49.108 use=0 exp=562622 gpc0=0 conn_rate(10000)=0
0x2822ef20: key=216.176.144.25 use=0 exp=566139 gpc0=0 conn_rate(10000)=0
0x282793e0: key=216.241.182.150 use=0 exp=550463 gpc0=0 conn_rate(10000)=0
0x28279ca0: key=222.152.255.81 use=0 exp=572397 gpc0=0 conn_rate(10000)=0
Anything suspicious? Other than the error i mean ;)
Cheers,
Joe
Idézet (Willy Tarreau <[email protected]>):
Hi Joe,
On Thu, Sep 16, 2010 at 04:49:00PM +0200, R.Nagy József wrote:
Some more details, let the production server suffer 2 more times to
test a narrowed down config.
The new config only worked as a rate limiter 1.5.dev haproxy instance,
and had a running 1.3 instance in the background doing the real
backend game.
I really appreciate your involvement in trying to get this issue solved.
So for the 1.5 rate limiter -still dieing- config was narrowed down to:
global
log 127.0.0.1 daemon debug
maxconn 1024
chroot /var/chroot/haproxy2
uid 99
gid 99
daemon
quiet
pidfile /var/run/haproxy-private2.pid
One thing could be very useful, it would be to add the stats socket here
in the global section :
stats socket /tmp/haproxy.sock level admin mode 666
stats timeout 1d
Then using the "socat" tool, you can connect to it and launch some
commands to inspect the internal state :
$ socat readline unix-connect:/tmp/haproxy.sock
prompt
> show info
> show stat
> show sess
> show table
> show table mySite-webfarm
I'm particularly interested in those outputs, they will make it easier
to find if we're facing a memory corruption, a resource shortage or any
such trouble. If it's easier for you, you can also chain all the commands
at once and avoid long copy-pastes :
$ echo "show info;show stat;show sess;show table;show table
mySite-webfarm" | socat stdio unix-connect:/tmp/haproxy.sock >
haproxy-debug.log
I'm just thinking about something else : there are basically two things
that change with the OS :
1) polling system
you may try to disable kqueue by adding "nokqueue" in the global section.
I don't think it's the issue because kqueue has not changed between 1.4
and 1.5 and there are some happy users of 1.4 on FreeBSD/OpenBSD.
2) struct sizes
the pool allocator merges structs of similar sizes in the same pools. In
the past it has already happened that an uninitialized member that was
always zero caused no trouble on most platforms but caused crashes on
other ones due to it containing data from another use. You can check
pool sizes by starting haproxy in debug mode then issuing a kill -QUIT
on it :
terminal1$ haproxy -db -f $file.cfg
terminal2$ killall -QUIT haproxy
Haproxy will then dump all of its pools statistics to the stderr output.
You don't need to do that in production in fact, you can do that on a
test machine, because the output only depends on the binary itself and
not on the environment.
And yeah, died with the same socks error message as yesterday.
(Server was hit by 30-40reqs/sec during this time, it died after ~30mins)
I noticed that the same error message can be found at two places. Could
you please adapt them both in order to also dump the FD value ? :
In src/session.c around line 215, please change :
Alert("accept(): cannot set the socket in non blocking mode. Giving up\n");
with :
perror("fcntl");
Alert("session_accept(): cannot set the socket %d in non blocking
mode. Giving up\n", cfd);
And in src/frontend.c, around line 89, replace the same line with :
perror("setsockopt");
Alert("frontend_accept(): cannot set the socket %d in non blocking
mode. Giving up\n", cfd);
I'm almost sure it's frontend_accept() that returns the error, and I'm
interested in knowing the reported file descriptor which probably is
buggy, as well as the errno code.
Thanks a lot for what you can do !
Willy