Re: varnish crashes
On 24-1-2010 20:31, Michael S. Fischer wrote: The other most common reason why the varnish supervisor can start killing off children is when they are blocked waiting on a page-in, which is usually due to VM overcommit (i.e., storage file size significantly exceeds RAM and you have a very large hot working set). You can usually see that when iostat -x output shows the I/O busy % is close to 100, meaning the disk is saturated. You can also see that in vmstat (look at the pi/po columns if you're using a file, or si/so if you're using malloc). Well, my balancers have 8GB ram, and were using a 350GB backend file.. I saw disk io was really high, 174% busy is kinda busy :) extended device statistics device r/s w/skr/skw/s wait svc_t %b ad4 100.3 85.2 4101.1 1355.92 37.5 174 ad6 102.5 85.2 4092.7 1355.93 29.1 150 So thanks for that! I now set the backend to an 8GB file, and I hope that will be better.. Do you have any recommendations, except buying faster disks? With Squid I was used to filling up the 300GB disks (we also serve large images), but I guess Varnish does not work that way.. -- With kind regards, Angelo Höngens systems administrator MCSE on Windows 2003 MCSE on Windows 2000 MS Small Business Specialist -- NetMatch tourism internet software solutions Ringbaan Oost 2b 5013 CA Tilburg +31 (0)13 5811088 +31 (0)13 5821239 a.hong...@netmatch.nl www.netmatch.nl -- ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish crashes
In message 4b5d70b5.5080...@netmatch.nl, =?ISO-8859-1?Q?Angelo_H=F6ngens?= wr ites: How are your disks configured ? 2 cheap SATA disks in a gmirror (it's a simple Dell R300). Hmm, that's going to hurt obviously... You would probably have been better off, not mirroring and giving Varnish a -sfile for each disk. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish crashes
On 25-1-2010 11:24, Poul-Henning Kamp wrote: In message 4b5d70b5.5080...@netmatch.nl, =?ISO-8859-1?Q?Angelo_H=F6ngens?= wr ites: How are your disks configured ? 2 cheap SATA disks in a gmirror (it's a simple Dell R300). Hmm, that's going to hurt obviously... You would probably have been better off, not mirroring and giving Varnish a -sfile for each disk. I'll take it into consideration, but first I'm going to run with the current configuration for a while to make sure varnish keeps responding. The disks are now 1-3% busy, and everything seems to run nice.. -- With kind regards, Angelo Höngens systems administrator MCSE on Windows 2003 MCSE on Windows 2000 MS Small Business Specialist -- NetMatch tourism internet software solutions Ringbaan Oost 2b 5013 CA Tilburg +31 (0)13 5811088 +31 (0)13 5821239 a.hong...@netmatch.nl www.netmatch.nl -- ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish crashes
On 23-1-2010 20:57, Michael Fischer wrote: On Sat, Jan 23, 2010 at 2:20 AM, Angelo Höngens a.hong...@netmatch.nl mailto:a.hong...@netmatch.nl wrote: (second try, I found out I was subscribed using a wrong email address) Hey, I am having some problems with Varnish. Unfortunately (depends on how you look at it), I had to replace our Squid cluster with Varnish in a day.. And now, we are finding out we're having some issues with it, sometimes Varnish just stops working. We have 4 balancers, each running FreeBSD 7.2 with 'device carp' compiled in. I haven't dared upgrade to 8.0 yet, because I had problems on my testmachine earlier with ipv6 and carp interfaces on 8.0. [ang...@nmt-nlb-06 ~]$ uname -a FreeBSD nmt-nlb-06.netmatchcolo1.local 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Mon Jun 15 19:25:03 CEST 2009 r...@nmt-nlb-06.netmatchcolo1.local:/usr/obj/usr/src/sys/NMT-NLB-06 amd64 Here's an example of a varnishd crashing, this is in /var/log/messages: Jan 23 09:49:39 nmt-nlb-06 varnishd[47478]: Child (47479) not responding to ping, killing it. Jan 23 10:49:43 nmt-nlb-06 kernel: pid 47479 (varnishd), uid 80: exited on signal 3 Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: Child (47479) not responding to ping, killing it. Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: Child (47479) not responding to ping, killing it. Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: child (54810) Started Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Pushing vcls failed: CLI communication error Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Child (54810) said Closed fds: 4 5 6 7 11 12 14 15 Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Child (54810) said Child starts Jan 23 09:51:15 nmt-nlb-06 varnishd[47478]: Child (54810) said managed to mmap 2319266349056 bytes of 2319266349056 Jan 23 09:51:15 nmt-nlb-06 varnishd[47478]: Child (54810) said Ready Does anyone know what could cause this? What is thread_pool_max set to? Have you tried lowering it? We have found that on systems with very high cache-hit ratios, 16 threads per CPU is the sweet spot to avoid context-switch saturation. [ang...@nmt-nlb-03 ~]$ varnishadm -T localhost:81 param.show| grep thread_pool thread_pool_add_delay 20 [milliseconds] thread_pool_add_threshold 2 [requests] thread_pool_fail_delay 200 [milliseconds] thread_pool_max500 [threads] thread_pool_min5 [threads] thread_pool_purge_delay1000 [milliseconds] thread_pool_timeout300 [seconds] thread_pools 2 [pools] Thread_pool_max is set to 500 threads.. But I just increased it to 4000 (as per http://varnish.projects.linpro.no/wiki/Performance), as 'top' shows me it's using around 480~490 threads now.. You suggest lowering it, what would be the effect of that? I would think it would run out of threads or something? Well, we'll see what happens with the increased threads.. I've also just increased thread_pools from 2 to 4.. (4 cores). -- With kind regards, Angelo Höngens systems administrator MCSE on Windows 2003 MCSE on Windows 2000 MS Small Business Specialist -- NetMatch tourism internet software solutions Ringbaan Oost 2b 5013 CA Tilburg +31 (0)13 5811088 +31 (0)13 5821239 a.hong...@netmatch.nl www.netmatch.nl -- ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish crashes
On Jan 24, 2010, at 7:23 AM, Angelo Höngens wrote: What is thread_pool_max set to? Have you tried lowering it? We have found that on systems with very high cache-hit ratios, 16 threads per CPU is the sweet spot to avoid context-switch saturation. [ang...@nmt-nlb-03 ~]$ varnishadm -T localhost:81 param.show| grep thread_pool thread_pool_add_delay 20 [milliseconds] thread_pool_add_threshold 2 [requests] thread_pool_fail_delay 200 [milliseconds] thread_pool_max500 [threads] thread_pool_min5 [threads] thread_pool_purge_delay1000 [milliseconds] thread_pool_timeout300 [seconds] thread_pools 2 [pools] Thread_pool_max is set to 500 threads.. But I just increased it to 4000 (as per http://varnish.projects.linpro.no/wiki/Performance), as 'top' shows me it's using around 480~490 threads now.. You suggest lowering it, what would be the effect of that? I would think it would run out of threads or something? Well, we'll see what happens with the increased threads.. Increasing concurrency is unlikely to solve the problem, although setting the number of thread pools to the number of CPUs is probably a good idea. Assuming a high hit ratio and high CPU utilization (you haven't posted either), lowering concurrency (i.e. reducing thread_pool_max) can help reduce CPU contention incurred by context switching. If maximum concurrency is reached, incoming connections will be deferred to the TCP listen(2) backlog (the overflowed_requests counter in varnishstat increases when this happens). When the request reaches the head of the queue, it will then be picked up by a processing thread. The net effect is some additional latency, but probably not as much as you're experiencing if your CPU is swamped with context switches. There are a few cases where increasing thread_pool_max can help, in particular, where you have a high cache-miss ratio and you have slow origin servers. But if CPU is already high, it will only make the problem worse. BTW, on FreeBSD you can view the current length of the listen(2) backlog via netstat -aL By default, varnishd's listen(2) backlog is 512; as long as you don't see the length hit that value you should be ok. --Michael ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish crashes
On Jan 24, 2010, at 10:40 AM, Angelo Höngens wrote: According to top, the CPU usage for the varnishd process is 0.0% at 400 req/sec. The load over the past 15 minutes is 0.45, probably mostly because of haproxy running on the same machine. So I don't think load is a problem.. My problem is that varnish sometimes just crashes or stops responding. My hit cache ratio is not that high, around 80%, and the backend servers can be slow at times (quite complex .net web apps). But I've changed some settings, and I am waiting for the next time varnish starts to stop responding.. I'm beginning to think it's something that grows over time, after restarting the varnish process things tend to run smooth for a while. I'll just keep monitoring it. The other most common reason why the varnish supervisor can start killing off children is when they are blocked waiting on a page-in, which is usually due to VM overcommit (i.e., storage file size significantly exceeds RAM and you have a very large hot working set). You can usually see that when iostat -x output shows the I/O busy % is close to 100, meaning the disk is saturated. You can also see that in vmstat (look at the pi/po columns if you're using a file, or si/so if you're using malloc). --Michael ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: varnish crashes
In message 4b5ad8b0.6090...@netmatch.nl, =?ISO-8859-1?Q?Angelo_H=F6ngens?= wr ites: By the way: the balancers do a total of 2000 req/sec now, but when stresstesting I can easily get 9000 cache/hits persec. So I don't think it's hanging on the upper limits of its performance. At that level of load, make sure to kldload the http accept filter. Your varnish-stat looks pretty OK. Have you configured health-polling of all those backends ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Varnish crashes everyday and same time
you need to write start after starting varnishd in debug mode. janis a satisfied Varnishd user :) On Thursday 22 November 2007 12:18, Erik wrote: Hi again, I made some logging of the memory and it looks fine. I also turned of VCL Trace but that didn't solved it. The crash happened again today but a few hours later then usual. I tried to start varnishd in debug mode but I cant get it to work. When I set it to -d or -d -d it starts but no connection can be made against it. Any ideas? I forgot to mention that Im running varnish on a Virtual Server 2005 with 512 MB RAM (150 MB free) and 10 GB diskspace. / Erik ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
RE: Re: Varnish crashes everyday and same time
Hi, This is what I do and what I get: /etc/init.d/varnish start Starting Varnish: varnishUsing old SHMFILE rolling(2)... It seems to me that the varnish is running? But when trying to connect it doesn't work! Althought when I run without -d -d or -d it works! I would really like to commit some logdata from varnishd but since I cant get the debug to work it has to wait :( / Erik Original Message --- you need to write start after starting varnishd in debug mode. janis a satisfied Varnishd user :) On Thursday 22 November 2007 12:18, Erik wrote: Hi again, I made some logging of the memory and it looks fine. I also turned of VCL Trace but that didn't solved it. The crash happened again today but a few hours later then usual. I tried to start varnishd in debug mode but I cant get it to work. When I set it to -d or -d -d it starts but no connection can be made against it. Any ideas? I forgot to mention that Im running varnish on a Virtual Server 2005 with 512 MB RAM (150 MB free) and 10 GB diskspace. / Erik ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Varnish crashes everyday and same time
Hi, Varnish crashes everyday, same time. This is what I got from the log files: 12 SessionOpen c xx.xx.xx.xx 31989 0 Debug Acceptor is epoll 0 Error CLI read 0 (errno=32) I also found this thread in the mailing archive from July: http://projects.linpro.no/pipermail/varnish-misc/2007-July/000670html That is the last post on that subject, no answer was posted to Anup Shukla. I dont know if that is the same problem but it has the same error in one of the posts. Im running varnish compiled from source on Debian 4.0 Etch. Im gonna start a logjob of the memory to see if thats its the problem. / Erik ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc