Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity

2008-06-17 Thread Guillaume Smet
Tony,

On Mon, Jun 16, 2008 at 7:31 PM, Anthony Tonns [EMAIL PROTECTED] wrote:
 Did you ever find a resolution to this issue? I'm running a very similar
 config and running into very similar problems - only on more servers
 using more memory and the RHEL squid package on CentOS 5 x86_64. Same
 symptoms - no paging going on, only using 5.5G of the 8G of ram. It will
 run fine for a few days. But then squid will totally consume 1 of the 4
 cores in the system (two dual-core AMD Opteron(tm) Processor 2212) but
 after restart only 10-20% of one core. The only significant difference
 other than sizing is that I have memory_replacement_policy set at lru
 instead of heap GDSF.

No. I let the broken Squid run for a few days, waiting for someone
helping us diagnosing the problem but as I didn't have any answer, we
restarted Squid as the service was really degraded.

It solved the problem and we haven't reproduced it for now.

FYI, we don't have a lot of regexp rules (a few refresh patterns and
around 20 user-agents acls). I tried to oprofile the production Squid
when we got the problem but didn't succeed in it. If someone has a
good oprofile tutorial, I'm more than interested as I didn't find
anything useful yet.

Be sure I'll keep the list informed if I have some news about the problem.

-- 
Guillaume


RE: [squid-users] High CPU usage and degraded service time after 2 weeks of activity

2008-06-16 Thread Anthony Tonns
Guillaume,

Did you ever find a resolution to this issue? I'm running a very similar
config and running into very similar problems - only on more servers
using more memory and the RHEL squid package on CentOS 5 x86_64. Same
symptoms - no paging going on, only using 5.5G of the 8G of ram. It will
run fine for a few days. But then squid will totally consume 1 of the 4
cores in the system (two dual-core AMD Opteron(tm) Processor 2212) but
after restart only 10-20% of one core. The only significant difference
other than sizing is that I have memory_replacement_policy set at lru
instead of heap GDSF.

I haven't had the opportunity to place squid in debug mode though to see
if I get the same errors in the logs, but there's nothing fishy in
cache.log with debug_options ALL,1 33,2 set.

Thanks,
Tony

 -Original Message-
 From: Guillaume Smet [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, February 26, 2008 2:35 PM
 To: squid-users@squid-cache.org
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: [squid-users] High CPU usage and degraded service time after
2
 weeks of activity
 
 Hi squid-users,
 
 We recently experienced a problem on our new Squid setup (2 Squid
 servers configured as reverse proxy - mostly the same configuration as
 before except we allocated more memory and disk on the new servers -
 the old boxes didn't have this problem). After 2 weeks of very good
 performances, both Squid instances have begun to use a lot of CPU
 resources (between 75 and 100% of one core instead of between 0 and
 10%): performances started to be really bad especially during peak
 hours.
 Same number of queries/s, same hit ratio but service time was really
 degraded. We let both Squid running like that for a week and the
 situation didn't improve.
 
 We restarted both Squid servers today and it fixed the problem for
 now: service time is back to normal.
 
 We found nothing in the cache.log. We decided to run one of the
 servers with full debug for a couple of minutes to see if we could
 find useful information. During these two minutes, we have a lot of
 clientReadRequest: FD XXX: no data to process ((11) Resource
 temporarily unavailable) in the logs (20k during these 2 minutes) but
 we don't know if it can be related. Here is a bit of context around a
 Resource temporarily unavailable line.
 2008/02/26 17:25:55| destroying entry 0x3b0e2500: 'Connection:
keep-alive'
 2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8
 2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 has 1 locks, not
freeing
 2008/02/26 17:25:55| clientKeepaliveNextRequest: FD 659 reading next
req
 2008/02/26 17:25:55| commSetTimeout: FD 659 timeout 120
 2008/02/26 17:25:55| clientReadRequest: FD 659: reading request...
 2008/02/26 17:25:55| clientReadRequest: FD 659: no data to process
 ((11) Resource temporarily unavailable)
 2008/02/26 17:25:55| cbdataLock: 0x2aab0bdf7418
 2008/02/26 17:25:55| cbdataValid: 0x2aab0bdf7418
 2008/02/26 17:25:55| cbdataUnlock: 0x2aab0bdf7418
 2008/02/26 17:25:55| commSetSelect: FD 659 type 1
 2008/02/26 17:25:55| commSetEvents(fd=659)
 2008/02/26 17:25:55| cbdataUnlock: 0x2aab3c2ce2a8
 
 We also noticed that we have negative numbers in the memory
 information of the cachemgr but we don't know if it's relevant:
 Memory usage for squid via mallinfo():
   Total space in arena:  -1419876 KB
   Ordinary blocks:   -1420149 KB579 blks
   Small blocks:   0 KB  0 blks
   Holding blocks:  7564 KB  8 blks
   Free Small blocks:  0 KB
   Free Ordinary blocks: 272 KB
   Total in use:  -1412585 KB 100%
   Total free:   272 KB 0%
   Total size:-1412312 KB
 
 Background information:
 CentOS 5 x86_64
 Squid 2.6STABLE18
 8GB of memory
 one Xeon E5345 @ 2.33GHz per box
 ~ 15 Mb/s per box during peak hours
 ~ 200 requests/s
 
 Cache configuration:
 cache_mem 2000 MB
 cache_dir aufs /data/services/squid/cache 8000 16 256
 
 cache_swap_low 90
 cache_swap_high 95
 cache_replacement_policy lru
 memory_replacement_policy heap GDSF
 maximum_object_size_in_memory 150 KB
 
 The setup is a reverse proxy setup with several ACLs, 2 active ports,
 2 delay pools, ICP between both servers but nothing really fancy. I
 can provide the full squid.conf if needed.
 
 The Squid process was using approximately 3.2 GB of memory on each
box.
 
 Does anybody have any idea on how we can fix this problem or how we
 can diagnose what happens?
 
 Thanks in advance.
 
 --
 Guillaume


Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity

2008-06-16 Thread Adrian Chadd
Are you using lots of regular expression rules?
Is this under Linux?

The wikipedia guys had a big problem with gnumalloc + regex rules causing
Squid to degrade much like how you've said.

You should be able to install the oprofile profiling suite in Centos/RHEL;
I suggest doing that (and installing the debugging version of the libc
package) and then getting some CPU time profiles out when Squid is running
normally versus running abnormally. Lodge all of that via bugzilla.

(oprofile (Linux) and hwpmc (FreeBSD) rock.)



Adrian

On Mon, Jun 16, 2008, Anthony Tonns wrote:
 Guillaume,
 
 Did you ever find a resolution to this issue? I'm running a very similar
 config and running into very similar problems - only on more servers
 using more memory and the RHEL squid package on CentOS 5 x86_64. Same
 symptoms - no paging going on, only using 5.5G of the 8G of ram. It will
 run fine for a few days. But then squid will totally consume 1 of the 4
 cores in the system (two dual-core AMD Opteron(tm) Processor 2212) but
 after restart only 10-20% of one core. The only significant difference
 other than sizing is that I have memory_replacement_policy set at lru
 instead of heap GDSF.
 
 I haven't had the opportunity to place squid in debug mode though to see
 if I get the same errors in the logs, but there's nothing fishy in
 cache.log with debug_options ALL,1 33,2 set.
 
 Thanks,
 Tony
 
  -Original Message-
  From: Guillaume Smet [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, February 26, 2008 2:35 PM
  To: squid-users@squid-cache.org
  Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
  Subject: [squid-users] High CPU usage and degraded service time after
 2
  weeks of activity
  
  Hi squid-users,
  
  We recently experienced a problem on our new Squid setup (2 Squid
  servers configured as reverse proxy - mostly the same configuration as
  before except we allocated more memory and disk on the new servers -
  the old boxes didn't have this problem). After 2 weeks of very good
  performances, both Squid instances have begun to use a lot of CPU
  resources (between 75 and 100% of one core instead of between 0 and
  10%): performances started to be really bad especially during peak
  hours.
  Same number of queries/s, same hit ratio but service time was really
  degraded. We let both Squid running like that for a week and the
  situation didn't improve.
  
  We restarted both Squid servers today and it fixed the problem for
  now: service time is back to normal.
  
  We found nothing in the cache.log. We decided to run one of the
  servers with full debug for a couple of minutes to see if we could
  find useful information. During these two minutes, we have a lot of
  clientReadRequest: FD XXX: no data to process ((11) Resource
  temporarily unavailable) in the logs (20k during these 2 minutes) but
  we don't know if it can be related. Here is a bit of context around a
  Resource temporarily unavailable line.
  2008/02/26 17:25:55| destroying entry 0x3b0e2500: 'Connection:
 keep-alive'
  2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8
  2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 has 1 locks, not
 freeing
  2008/02/26 17:25:55| clientKeepaliveNextRequest: FD 659 reading next
 req
  2008/02/26 17:25:55| commSetTimeout: FD 659 timeout 120
  2008/02/26 17:25:55| clientReadRequest: FD 659: reading request...
  2008/02/26 17:25:55| clientReadRequest: FD 659: no data to process
  ((11) Resource temporarily unavailable)
  2008/02/26 17:25:55| cbdataLock: 0x2aab0bdf7418
  2008/02/26 17:25:55| cbdataValid: 0x2aab0bdf7418
  2008/02/26 17:25:55| cbdataUnlock: 0x2aab0bdf7418
  2008/02/26 17:25:55| commSetSelect: FD 659 type 1
  2008/02/26 17:25:55| commSetEvents(fd=659)
  2008/02/26 17:25:55| cbdataUnlock: 0x2aab3c2ce2a8
  
  We also noticed that we have negative numbers in the memory
  information of the cachemgr but we don't know if it's relevant:
  Memory usage for squid via mallinfo():
  Total space in arena:  -1419876 KB
  Ordinary blocks:   -1420149 KB579 blks
  Small blocks:   0 KB  0 blks
  Holding blocks:  7564 KB  8 blks
  Free Small blocks:  0 KB
  Free Ordinary blocks: 272 KB
  Total in use:  -1412585 KB 100%
  Total free:   272 KB 0%
  Total size:-1412312 KB
  
  Background information:
  CentOS 5 x86_64
  Squid 2.6STABLE18
  8GB of memory
  one Xeon E5345 @ 2.33GHz per box
  ~ 15 Mb/s per box during peak hours
  ~ 200 requests/s
  
  Cache configuration:
  cache_mem 2000 MB
  cache_dir aufs /data/services/squid/cache 8000 16 256
  
  cache_swap_low 90
  cache_swap_high 95
  cache_replacement_policy lru
  memory_replacement_policy heap GDSF
  maximum_object_size_in_memory 150 KB
  
  The setup is a reverse proxy setup with several ACLs, 2 active ports,
  2 delay pools, ICP between both servers but nothing really fancy. I
  can provide the full squid.conf if needed.
  
  The Squid process was using 

RE: [squid-users] High CPU usage and degraded service time after 2 weeks of activity

2008-06-16 Thread Anthony Tonns
 Are you using lots of regular expression rules?
 Is this under Linux?
 
 The wikipedia guys had a big problem with gnumalloc + regex rules
causing
 Squid to degrade much like how you've said.
 
 You should be able to install the oprofile profiling suite in
Centos/RHEL;
 I suggest doing that (and installing the debugging version of the libc
 package) and then getting some CPU time profiles out when Squid is
running
 normally versus running abnormally. Lodge all of that via bugzilla.
 
 (oprofile (Linux) and hwpmc (FreeBSD) rock.)

CentOS is Linux ;-) so I will look into getting oprofile setup in our
sandbox environment and hammering away as a first step.

Also, what do you consider lots of regex rules? I have about 20 or so
rules that match on req_header with a regex of .* (i.e. does this header
exist) and then around 150 lines in some files that match Via and
User-Agent headers.

A quick search for squid gnumalloc regex, etc. etc. didn't yield too
many useful results. Can you clue me into the problems (and solutions!)
you're referring to? Is it gnumalloc or the regex that is the problem?
Is dlmalloc the solution?

Thanks,
Tony



Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity

2008-06-16 Thread Adrian Chadd
On Mon, Jun 16, 2008, Anthony Tonns wrote:

 CentOS is Linux ;-) so I will look into getting oprofile setup in our
 sandbox environment and hammering away as a first step.

Ok.

 Also, what do you consider lots of regex rules? I have about 20 or so
 rules that match on req_header with a regex of .* (i.e. does this header
 exist) and then around 150 lines in some files that match Via and
 User-Agent headers.

That might qualify as a lot.

 A quick search for squid gnumalloc regex, etc. etc. didn't yield too
 many useful results. Can you clue me into the problems (and solutions!)
 you're referring to? Is it gnumalloc or the regex that is the problem?
 Is dlmalloc the solution?

The problem was gnumalloc + gnuregex + time == fail.

The solution was Google malloc; but I'd really suggest you hook up oprofile
first to see where the CPU is going before you try another malloc.
No, dlmalloc isn't the solution. :)



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity

2008-02-26 Thread leongmzlist
What is the swap usage?  I once had the same problem w/ squid 
degrading over time.  I had to reduce the cache_mem from 2GB to 
512MB, and reduce the amount of objects in the cache since the index 
was growing too big.



mike

At 11:35 AM 2/26/2008, Guillaume Smet wrote:

Hi squid-users,

We recently experienced a problem on our new Squid setup (2 Squid
servers configured as reverse proxy - mostly the same configuration as
before except we allocated more memory and disk on the new servers -
the old boxes didn't have this problem). After 2 weeks of very good
performances, both Squid instances have begun to use a lot of CPU
resources (between 75 and 100% of one core instead of between 0 and
10%): performances started to be really bad especially during peak
hours.
Same number of queries/s, same hit ratio but service time was really
degraded. We let both Squid running like that for a week and the
situation didn't improve.

We restarted both Squid servers today and it fixed the problem for
now: service time is back to normal.

We found nothing in the cache.log. We decided to run one of the
servers with full debug for a couple of minutes to see if we could
find useful information. During these two minutes, we have a lot of
clientReadRequest: FD XXX: no data to process ((11) Resource
temporarily unavailable) in the logs (20k during these 2 minutes) but
we don't know if it can be related. Here is a bit of context around a
Resource temporarily unavailable line.
2008/02/26 17:25:55| destroying entry 0x3b0e2500: 'Connection: keep-alive'
2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8
2008/02/26 17:25:55| cbdataFree: 0x2aab3c2ce2a8 has 1 locks, not freeing
2008/02/26 17:25:55| clientKeepaliveNextRequest: FD 659 reading next req
2008/02/26 17:25:55| commSetTimeout: FD 659 timeout 120
2008/02/26 17:25:55| clientReadRequest: FD 659: reading request...
2008/02/26 17:25:55| clientReadRequest: FD 659: no data to process
((11) Resource temporarily unavailable)
2008/02/26 17:25:55| cbdataLock: 0x2aab0bdf7418
2008/02/26 17:25:55| cbdataValid: 0x2aab0bdf7418
2008/02/26 17:25:55| cbdataUnlock: 0x2aab0bdf7418
2008/02/26 17:25:55| commSetSelect: FD 659 type 1
2008/02/26 17:25:55| commSetEvents(fd=659)
2008/02/26 17:25:55| cbdataUnlock: 0x2aab3c2ce2a8

We also noticed that we have negative numbers in the memory
information of the cachemgr but we don't know if it's relevant:
Memory usage for squid via mallinfo():
Total space in arena:  -1419876 KB
Ordinary blocks:   -1420149 KB579 blks
Small blocks:   0 KB  0 blks
Holding blocks:  7564 KB  8 blks
Free Small blocks:  0 KB
Free Ordinary blocks: 272 KB
Total in use:  -1412585 KB 100%
Total free:   272 KB 0%
Total size:-1412312 KB

Background information:
CentOS 5 x86_64
Squid 2.6STABLE18
8GB of memory
one Xeon E5345 @ 2.33GHz per box
~ 15 Mb/s per box during peak hours
~ 200 requests/s

Cache configuration:
cache_mem 2000 MB
cache_dir aufs /data/services/squid/cache 8000 16 256

cache_swap_low 90
cache_swap_high 95
cache_replacement_policy lru
memory_replacement_policy heap GDSF
maximum_object_size_in_memory 150 KB

The setup is a reverse proxy setup with several ACLs, 2 active ports,
2 delay pools, ICP between both servers but nothing really fancy. I
can provide the full squid.conf if needed.

The Squid process was using approximately 3.2 GB of memory on each box.

Does anybody have any idea on how we can fix this problem or how we
can diagnose what happens?

Thanks in advance.

--
Guillaume




Re: [squid-users] High CPU usage and degraded service time after 2 weeks of activity

2008-02-26 Thread Guillaume Smet
On Wed, Feb 27, 2008 at 12:22 AM, leongmzlist [EMAIL PROTECTED] wrote:
 What is the swap usage?  I once had the same problem w/ squid
  degrading over time.  I had to reduce the cache_mem from 2GB to
  512MB, and reduce the amount of objects in the cache since the index
  was growing too big.

I forgot to mention it. No swap at all.