Re: [AOLSERVER] error: "unable to realloc XXXX bytes"

Hossein Sharifi Thu, 17 Aug 2006 15:29:42 -0700

Hi Dossy,

I have tried setting stacksize to 512k / 1MB, and I still receive therealloc error. I've also disabled vm_overcommit_memory (by setting itto 2) and it didn't help, unfortunately. For reference, here were thestats:


$ grep Commit /proc/meminfo
CommitLimit:   2081908 kB
Committed_AS:   326728 kB

$ sysctl vm.overcommit_{memory,ratio}
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

>How much swap do you have configured?

I have 512MB of swap configured, and its usage seems to vary between 0kband 150kb. I think 512 is sufficient but I can increase it if necessary.After some experimentation last night, I was able to get the memoryusage past 1G by running some simple scripts like this in parallel:

set a "1" ; for { set i 0 } { $i < 300000000 } { incr i } { append a"11111111111111"}


The usage climbed to:
12199 nsadmin   17   0 1591m 1.0g 2984 S  200 34.8   3:37.78 nsd

and it seemed to be stable while running these scripts.

I am beginning to think it's less of a memory issue and more of a"something's not thread safe" issue (even though i've removed all mycustom modules). I can't reproduce this problem with simulations, but assoon as I switch my real users to the new server, it begins restartingwithin a couple of minutes. The fact that it restarts at around90-110MB could just be a coincidence.But I have no idea how to find the culprit. When I switched from3.3+ad13 to 4.0.10, my server crashed all the time, and I debugged nsdand found that ns_server was at fault. So I removed references to it,and everything worked fine.

Now it's crashing in the tcl interpreter, and I have no idea how to getthe underlying tcl code from that (and even if I did, it's probably notrelated to the actual cause of the issue, which is more likely heapcorruption from a different thread).

Since all things are otherwise equal between my FC4 and FC5 box, thatmight indicate a glibc issue or something similar. But I've found itdifficult to link against a different glibc (since yourbintools/gcc/kernel have to match closely, and upgrading glibc can breakthe rest of your system).

Some more info, if it helps: My server receives a medium-high amount oftraffic, about 4 million requests/day, sometimes as high as 200requests/second.I don't use ns_server anymore, but I do use [ns_info pageroot], the"source" command, exec (to run imagemagick, expect scripts, and otherthings - probably 1-2 execs/second). I get/set something in ns_cache onalmost every request, although the caches themselves are relativelysmall (200-300k), and I believe I was able to reproduce the problem evenwith ns_cache disabled. I use postgresql over tcp/ip, 3 pools, 5connections per pool (I tried Marc's maxidle/open suggestion with nosuccess). And I don't use ADP at all - only tcl/static content.


Thanks for all your help so far.

-Hossein


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Re: [AOLSERVER] error: "unable to realloc XXXX bytes"

Reply via email to