Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity
Tony, On Mon, Jun 16, 2008 at 7:31 PM, Anthony Tonns [EMAIL PROTECTED] wrote: Did you ever find a resolution to this issue? I'm running a very similar config and running into very similar problems - only on more servers using more memory and the RHEL squid package on CentOS 5 x86_64. Same symptoms - no paging going on, only using 5.5G of the 8G of ram. It will run fine for a few days. But then squid will totally consume 1 of the 4 cores in the system (two dual-core AMD Opteron(tm) Processor 2212) but after restart only 10-20% of one core. The only significant difference other than sizing is that I have memory_replacement_policy set at lru instead of heap GDSF. No. I let the broken Squid run for a few days, waiting for someone helping us diagnosing the problem but as I didn't have any answer, we restarted Squid as the service was really degraded. It solved the problem and we haven't reproduced it for now. FYI, we don't have a lot of regexp rules (a few refresh patterns and around 20 user-agents acls). I tried to oprofile the production Squid when we got the problem but didn't succeed in it. If someone has a good oprofile tutorial, I'm more than interested as I didn't find anything useful yet. Be sure I'll keep the list informed if I have some news about the problem. -- Guillaume
RE: [squid-users] High CPU usage and degraded service time after 2 weeks of activity
Guillaume, Did you ever find a resolution to this issue? I'm running a very similar config and running into very similar problems - only on more servers using more memory and the RHEL squid package on CentOS 5 x86_64. Same symptoms - no paging going on, only using 5.5G of the 8G of ram. It will run fine for a few days. But then squid will totally consume 1 of the 4 cores in the system (two dual-core AMD Opteron(tm) Processor 2212) but after restart only 10-20% of one core. The only significant difference other than sizing is that I have memory_replacement_policy set at lru instead of heap GDSF. I haven't had the opportunity to place squid in debug mode though to see if I get the same errors in the logs, but there's nothing fishy in cache.log with debug_options ALL,1 33,2 set. Thanks, Tony -Original Message- From: Guillaume Smet [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 26, 2008 2:35 PM To: squid-users@squid-cache.org Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: [squid-users] High CPU usage and degraded service time after 2 weeks of activity Hi squid-users, We recently experienced a problem on our new Squid setup (2 Squid servers configured as reverse proxy - mostly the same configuration as before except we allocated more memory and disk on the new servers - the old boxes didn't have this problem). After 2 weeks of very good performances, both Squid instances have begun to use a lot of CPU resources (between 75 and 100% of one core instead of between 0 and 10%): performances started to be really bad especially during peak hours. Same number of queries/s, same hit ratio but service time was really degraded. We let both Squid running like that for a week and the situation didn't improve. We restarted both Squid servers today and it fixed the problem for now: service time is back to normal. We found nothing in the cache.log. We decided to run one of the servers with full debug for a couple of minutes to see if we could find useful information. During these two minutes, we have a lot of clientReadRequest: FD XXX: no data to process ((11) Resource temporarily unavailable) in the logs (20k during these 2 minutes) but we don't know if it can be related. Here is a bit of context around a Resource temporarily unavailable line. 2008/02/26 17:25:55| destroying entry 0x3b0e2500: 'Connection: keep-alive' 2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 has 1 locks, not freeing 2008/02/26 17:25:55| clientKeepaliveNextRequest: FD 659 reading next req 2008/02/26 17:25:55| commSetTimeout: FD 659 timeout 120 2008/02/26 17:25:55| clientReadRequest: FD 659: reading request... 2008/02/26 17:25:55| clientReadRequest: FD 659: no data to process ((11) Resource temporarily unavailable) 2008/02/26 17:25:55| cbdataLock: 0x2aab0bdf7418 2008/02/26 17:25:55| cbdataValid: 0x2aab0bdf7418 2008/02/26 17:25:55| cbdataUnlock: 0x2aab0bdf7418 2008/02/26 17:25:55| commSetSelect: FD 659 type 1 2008/02/26 17:25:55| commSetEvents(fd=659) 2008/02/26 17:25:55| cbdataUnlock: 0x2aab3c2ce2a8 We also noticed that we have negative numbers in the memory information of the cachemgr but we don't know if it's relevant: Memory usage for squid via mallinfo(): Total space in arena: -1419876 KB Ordinary blocks: -1420149 KB579 blks Small blocks: 0 KB 0 blks Holding blocks: 7564 KB 8 blks Free Small blocks: 0 KB Free Ordinary blocks: 272 KB Total in use: -1412585 KB 100% Total free: 272 KB 0% Total size:-1412312 KB Background information: CentOS 5 x86_64 Squid 2.6STABLE18 8GB of memory one Xeon E5345 @ 2.33GHz per box ~ 15 Mb/s per box during peak hours ~ 200 requests/s Cache configuration: cache_mem 2000 MB cache_dir aufs /data/services/squid/cache 8000 16 256 cache_swap_low 90 cache_swap_high 95 cache_replacement_policy lru memory_replacement_policy heap GDSF maximum_object_size_in_memory 150 KB The setup is a reverse proxy setup with several ACLs, 2 active ports, 2 delay pools, ICP between both servers but nothing really fancy. I can provide the full squid.conf if needed. The Squid process was using approximately 3.2 GB of memory on each box. Does anybody have any idea on how we can fix this problem or how we can diagnose what happens? Thanks in advance. -- Guillaume
Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity
Are you using lots of regular expression rules? Is this under Linux? The wikipedia guys had a big problem with gnumalloc + regex rules causing Squid to degrade much like how you've said. You should be able to install the oprofile profiling suite in Centos/RHEL; I suggest doing that (and installing the debugging version of the libc package) and then getting some CPU time profiles out when Squid is running normally versus running abnormally. Lodge all of that via bugzilla. (oprofile (Linux) and hwpmc (FreeBSD) rock.) Adrian On Mon, Jun 16, 2008, Anthony Tonns wrote: Guillaume, Did you ever find a resolution to this issue? I'm running a very similar config and running into very similar problems - only on more servers using more memory and the RHEL squid package on CentOS 5 x86_64. Same symptoms - no paging going on, only using 5.5G of the 8G of ram. It will run fine for a few days. But then squid will totally consume 1 of the 4 cores in the system (two dual-core AMD Opteron(tm) Processor 2212) but after restart only 10-20% of one core. The only significant difference other than sizing is that I have memory_replacement_policy set at lru instead of heap GDSF. I haven't had the opportunity to place squid in debug mode though to see if I get the same errors in the logs, but there's nothing fishy in cache.log with debug_options ALL,1 33,2 set. Thanks, Tony -Original Message- From: Guillaume Smet [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 26, 2008 2:35 PM To: squid-users@squid-cache.org Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: [squid-users] High CPU usage and degraded service time after 2 weeks of activity Hi squid-users, We recently experienced a problem on our new Squid setup (2 Squid servers configured as reverse proxy - mostly the same configuration as before except we allocated more memory and disk on the new servers - the old boxes didn't have this problem). After 2 weeks of very good performances, both Squid instances have begun to use a lot of CPU resources (between 75 and 100% of one core instead of between 0 and 10%): performances started to be really bad especially during peak hours. Same number of queries/s, same hit ratio but service time was really degraded. We let both Squid running like that for a week and the situation didn't improve. We restarted both Squid servers today and it fixed the problem for now: service time is back to normal. We found nothing in the cache.log. We decided to run one of the servers with full debug for a couple of minutes to see if we could find useful information. During these two minutes, we have a lot of clientReadRequest: FD XXX: no data to process ((11) Resource temporarily unavailable) in the logs (20k during these 2 minutes) but we don't know if it can be related. Here is a bit of context around a Resource temporarily unavailable line. 2008/02/26 17:25:55| destroying entry 0x3b0e2500: 'Connection: keep-alive' 2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 has 1 locks, not freeing 2008/02/26 17:25:55| clientKeepaliveNextRequest: FD 659 reading next req 2008/02/26 17:25:55| commSetTimeout: FD 659 timeout 120 2008/02/26 17:25:55| clientReadRequest: FD 659: reading request... 2008/02/26 17:25:55| clientReadRequest: FD 659: no data to process ((11) Resource temporarily unavailable) 2008/02/26 17:25:55| cbdataLock: 0x2aab0bdf7418 2008/02/26 17:25:55| cbdataValid: 0x2aab0bdf7418 2008/02/26 17:25:55| cbdataUnlock: 0x2aab0bdf7418 2008/02/26 17:25:55| commSetSelect: FD 659 type 1 2008/02/26 17:25:55| commSetEvents(fd=659) 2008/02/26 17:25:55| cbdataUnlock: 0x2aab3c2ce2a8 We also noticed that we have negative numbers in the memory information of the cachemgr but we don't know if it's relevant: Memory usage for squid via mallinfo(): Total space in arena: -1419876 KB Ordinary blocks: -1420149 KB579 blks Small blocks: 0 KB 0 blks Holding blocks: 7564 KB 8 blks Free Small blocks: 0 KB Free Ordinary blocks: 272 KB Total in use: -1412585 KB 100% Total free: 272 KB 0% Total size:-1412312 KB Background information: CentOS 5 x86_64 Squid 2.6STABLE18 8GB of memory one Xeon E5345 @ 2.33GHz per box ~ 15 Mb/s per box during peak hours ~ 200 requests/s Cache configuration: cache_mem 2000 MB cache_dir aufs /data/services/squid/cache 8000 16 256 cache_swap_low 90 cache_swap_high 95 cache_replacement_policy lru memory_replacement_policy heap GDSF maximum_object_size_in_memory 150 KB The setup is a reverse proxy setup with several ACLs, 2 active ports, 2 delay pools, ICP between both servers but nothing really fancy. I can provide the full squid.conf if needed. The Squid process was using
RE: [squid-users] High CPU usage and degraded service time after 2 weeks of activity
Are you using lots of regular expression rules? Is this under Linux? The wikipedia guys had a big problem with gnumalloc + regex rules causing Squid to degrade much like how you've said. You should be able to install the oprofile profiling suite in Centos/RHEL; I suggest doing that (and installing the debugging version of the libc package) and then getting some CPU time profiles out when Squid is running normally versus running abnormally. Lodge all of that via bugzilla. (oprofile (Linux) and hwpmc (FreeBSD) rock.) CentOS is Linux ;-) so I will look into getting oprofile setup in our sandbox environment and hammering away as a first step. Also, what do you consider lots of regex rules? I have about 20 or so rules that match on req_header with a regex of .* (i.e. does this header exist) and then around 150 lines in some files that match Via and User-Agent headers. A quick search for squid gnumalloc regex, etc. etc. didn't yield too many useful results. Can you clue me into the problems (and solutions!) you're referring to? Is it gnumalloc or the regex that is the problem? Is dlmalloc the solution? Thanks, Tony
Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity
On Mon, Jun 16, 2008, Anthony Tonns wrote: CentOS is Linux ;-) so I will look into getting oprofile setup in our sandbox environment and hammering away as a first step. Ok. Also, what do you consider lots of regex rules? I have about 20 or so rules that match on req_header with a regex of .* (i.e. does this header exist) and then around 150 lines in some files that match Via and User-Agent headers. That might qualify as a lot. A quick search for squid gnumalloc regex, etc. etc. didn't yield too many useful results. Can you clue me into the problems (and solutions!) you're referring to? Is it gnumalloc or the regex that is the problem? Is dlmalloc the solution? The problem was gnumalloc + gnuregex + time == fail. The solution was Google malloc; but I'd really suggest you hook up oprofile first to see where the CPU is going before you try another malloc. No, dlmalloc isn't the solution. :) Adrian -- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity
What is the swap usage? I once had the same problem w/ squid degrading over time. I had to reduce the cache_mem from 2GB to 512MB, and reduce the amount of objects in the cache since the index was growing too big. mike At 11:35 AM 2/26/2008, Guillaume Smet wrote: Hi squid-users, We recently experienced a problem on our new Squid setup (2 Squid servers configured as reverse proxy - mostly the same configuration as before except we allocated more memory and disk on the new servers - the old boxes didn't have this problem). After 2 weeks of very good performances, both Squid instances have begun to use a lot of CPU resources (between 75 and 100% of one core instead of between 0 and 10%): performances started to be really bad especially during peak hours. Same number of queries/s, same hit ratio but service time was really degraded. We let both Squid running like that for a week and the situation didn't improve. We restarted both Squid servers today and it fixed the problem for now: service time is back to normal. We found nothing in the cache.log. We decided to run one of the servers with full debug for a couple of minutes to see if we could find useful information. During these two minutes, we have a lot of clientReadRequest: FD XXX: no data to process ((11) Resource temporarily unavailable) in the logs (20k during these 2 minutes) but we don't know if it can be related. Here is a bit of context around a Resource temporarily unavailable line. 2008/02/26 17:25:55| destroying entry 0x3b0e2500: 'Connection: keep-alive' 2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 has 1 locks, not freeing 2008/02/26 17:25:55| clientKeepaliveNextRequest: FD 659 reading next req 2008/02/26 17:25:55| commSetTimeout: FD 659 timeout 120 2008/02/26 17:25:55| clientReadRequest: FD 659: reading request... 2008/02/26 17:25:55| clientReadRequest: FD 659: no data to process ((11) Resource temporarily unavailable) 2008/02/26 17:25:55| cbdataLock: 0x2aab0bdf7418 2008/02/26 17:25:55| cbdataValid: 0x2aab0bdf7418 2008/02/26 17:25:55| cbdataUnlock: 0x2aab0bdf7418 2008/02/26 17:25:55| commSetSelect: FD 659 type 1 2008/02/26 17:25:55| commSetEvents(fd=659) 2008/02/26 17:25:55| cbdataUnlock: 0x2aab3c2ce2a8 We also noticed that we have negative numbers in the memory information of the cachemgr but we don't know if it's relevant: Memory usage for squid via mallinfo(): Total space in arena: -1419876 KB Ordinary blocks: -1420149 KB579 blks Small blocks: 0 KB 0 blks Holding blocks: 7564 KB 8 blks Free Small blocks: 0 KB Free Ordinary blocks: 272 KB Total in use: -1412585 KB 100% Total free: 272 KB 0% Total size:-1412312 KB Background information: CentOS 5 x86_64 Squid 2.6STABLE18 8GB of memory one Xeon E5345 @ 2.33GHz per box ~ 15 Mb/s per box during peak hours ~ 200 requests/s Cache configuration: cache_mem 2000 MB cache_dir aufs /data/services/squid/cache 8000 16 256 cache_swap_low 90 cache_swap_high 95 cache_replacement_policy lru memory_replacement_policy heap GDSF maximum_object_size_in_memory 150 KB The setup is a reverse proxy setup with several ACLs, 2 active ports, 2 delay pools, ICP between both servers but nothing really fancy. I can provide the full squid.conf if needed. The Squid process was using approximately 3.2 GB of memory on each box. Does anybody have any idea on how we can fix this problem or how we can diagnose what happens? Thanks in advance. -- Guillaume
Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity
On Wed, Feb 27, 2008 at 12:22 AM, leongmzlist [EMAIL PROTECTED] wrote: What is the swap usage? I once had the same problem w/ squid degrading over time. I had to reduce the cache_mem from 2GB to 512MB, and reduce the amount of objects in the cache since the index was growing too big. I forgot to mention it. No swap at all.