On Thu, 28 Aug 2008, Steven Hartland wrote:
We're using lighttpd here for a new project and we're having issues where by
it simply stops processing after a 1-2 days.
Having looked at it in some detail this morning it seems that the kernel is
resetting the connection without notifying the lighttpd process there is a
new connection attempt. I assume that the listen queue is full but why
kevent is not notifying lighttpd that there are outstanding events is beyond
me.
The connections getting reset without application notification is a classic
symptom of a full listen queue. A couple of questions:
(1) What FreeBSD version?
(2) Are you using accept filters?
(3) If possibly, are you able to instrument lighthttpd so that you can trigger
it to query SO_LISTENQLIMIT, SO_LISTENQLEN, and SO_LISTENINCQLEN on the
listen socket once things have gone wrong? The respectively (and perhaps
obviously) querye the current administrative limit on queue depth, the
number queue depth on completed connections, and the current queue depth
on incomplete connections. The last of these will only be used with
accept filters on recent FreeBSD network stacks (since the syncache was
added).
Hopefully doing (3) will allow us to try to determine whether it's indeed the
case that somehow the listen queue or event handling has gotten "wedged" in
some way.
In terms of analyzing the state of the machine -- if you have a kernel.debug
around and are willing to do a bit of digging, the best thing to do would be
to track down the listen socket and directly inspect it using kgdb to dump its
field contents. This can be done on a live box by attaching kgdb to kernel
memory using /dev/mem as the target device. You can find the kernel memory
address of the listen socket by tracking it down in fstat -- a typical entry
might look like this:
root inetd 1158 9* internet stream tcp c5350000
So you can do a "print *(socket *)0xc5350000" to print out the socket
structure once attached to /dev/mem. If you need more pointers on how to do
this, send me a private e-mail and I can walk you through it in detail.
Robert N M Watson
Computer Laboratory
University of Cambridge
The following is a truss of the process which is currently in
this state:-
kevent(6,0x0,0,{},11096,{1.000000000}) = 0 (0x0)
gettimeofday({1219920575.149428},0x0) = 0 (0x0)
kevent(6,0x0,0,{},11096,{1.000000000}) = 0 (0x0)
gettimeofday({1219920576.150443},0x0) = 0 (0x0)
ktrace of the operation as well:-
28363 lighttpd RET kevent 0
28363 lighttpd CALL gettimeofday(0x7fffffffeb20,0)
28363 lighttpd RET gettimeofday 0
28363 lighttpd CALL kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffffffeb20)
28363 lighttpd GIO fd 6 wrote 0 bytes
""
28363 lighttpd GIO fd 6 read 0 bytes
""
28363 lighttpd RET kevent 0
28363 lighttpd CALL gettimeofday(0x7fffffffeb20,0)
28363 lighttpd RET gettimeofday 0
28363 lighttpd CALL kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffffffeb20)
28363 lighttpd GIO fd 6 wrote 0 bytes
""
28363 lighttpd GIO fd 6 read 0 bytes
""
28363 lighttpd RET kevent 0
28363 lighttpd CALL gettimeofday(0x7fffffffeb20,0)
28363 lighttpd RET gettimeofday 0
28363 lighttpd CALL kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffffffeb20)
28363 lighttpd GIO fd 6 wrote 0 bytes
""
28363 lighttpd GIO fd 6 read 0 bytes
""
28363 lighttpd RET kevent 0
28363 lighttpd CALL gettimeofday(0x7fffffffeb20,0)
28363 lighttpd RET gettimeofday 0
28363 lighttpd CALL kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffffffeb20)
28363 lighttpd GIO fd 6 wrote 0 bytes
""
28363 lighttpd GIO fd 6 read 0 bytes
""
28363 lighttpd RET kevent 0
28363 lighttpd CALL gettimeofday(0x7fffffffeb20,0)
28363 lighttpd RET gettimeofday 0
28363 lighttpd CALL kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffffffeb20)
28363 lighttpd GIO fd 6 wrote 0 bytes
""
28363 lighttpd GIO fd 6 read 0 bytes
""
28363 lighttpd RET kevent 0
28363 lighttpd CALL gettimeofday(0x7fffffffeb20,0)
28363 lighttpd RET gettimeofday 0
28363 lighttpd CALL kevent(0x6,0,0,0x800e66000,0x2b58,0x7fffffffeb20)
tcpdump shows:-
12:10:29.475255 IP (tos 0x10, ttl 64, id 9536, offset 0, flags [DF], proto:
TCP (6), length: 64) client.61224 > server.80: S, cksum 0x6d22 (incorrect (->
0xedfa), 291994449:291994449(0) win 65535 <mss 1460,nop,wscale
1,nop,nop,timestamp 3661727139 0,sackOK,eol>
12:10:29.481396 IP (tos 0x0, ttl 61, id 25503, offset 0, flags [DF], proto:
TCP (6), length: 60) server.80 > client.61224: S, cksum 0xbf22 (correct),
3444532576:3444532576(0) ack 291994450 win 65535 <mss 1460,nop,wscale
9,sackOK,timestamp 3136311843 3661727139>
12:10:29.481419 IP (tos 0x10, ttl 64, id 9538, offset 0, flags [DF], proto:
TCP (6), length: 52) client.61224 > server.80: ., cksum 0x6d16 (incorrect (->
0x6bd2), 1:1(0) ack 1 win 33304 <nop,nop,timestamp 3661727145 3136311843>
12:10:29.487519 IP (tos 0x10, ttl 61, id 25504, offset 0, flags [DF], proto:
TCP (6), length: 40) server.80 > client.61224: R, cksum 0x20c7 (correct),
3444532577:3444532577(0) win 0
This may have been raised before back 2003 as bug kern/57380
but it was closed after no response from the reporter.
Another possible issues related to this is:-
http://trac.lighttpd.net/trac/ticket/1734
I've currently got one of the production machines offline
with this error ( hence the important flag ) in the hope
that someone can suggest a test which will shed more light
on the issue before I restart it.
Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the
person or entity to whom it is addressed. In the event of misdirection, the
recipient is prohibited from using, copying, printing or otherwise
disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"