It appears that you have the same problem in all of your servers; the
goal is to find out what part of the code is failing and under what
conditions. Three things stand out: failed servers are under a heavier
load than those that don't exhibit the failure; the failure happens
shortly after shifting a load via pound onto an already running
aolserver instance; the failure happens after a reload of your procs
on that server. Since you've said that none of the loads is heavy I
don't think this problem is triggered by aolserver being overwhelmed
with traffic. This leaves two things: the shifting of the load itself
to an already running aolserver instance and the reloading of the
procs on that aolserver instance. I suspect the problem is related to
reloading the procs with ns_eval, and not due to load shifting or load
volume, but we need to confirm that.
Is there a way you can run an aolserver instance directly answering
queries without using pound? Maybe you could set up a test server that
you then use http_load or apache bench on. Once running, hit it with a
load and see if it stays up for at least 10-20 minutes. If it does, do
a reload of your procs on that server without doing anything else --
what I expect is that the aolserver instance will crash shortly after
doing the proc reload. You can then restart the server and try it
again, this time reloading the procs immediately. Then repeat, but
reload the procs after 5 minutes or so. In each case, determine how
long it takes the server to crash after the proc reload (make sure the
aolserver instance has started and continues to server connections
before, during and after the reload).
If anyone else is experiencing the same problems, please post your
information along with your configuration.
/s.
On Oct 29, 2008, at 5:29 AM, Rami Jadaa wrote:
Hi Scott,
Thanks for your reply.
I don't think that I can send the log as it will be so big , as
AOlserver initiates and load a lot of ACS code...
And for the checksum, we did the following:
Using pound, we shifted the load going to this webserver to another
server on another machine where it uses a different local copy of
the same application, and then after the reload, the server were we
shifted the load to crashed, and the old one didn't!!
So i can take out he doubt on file corruption, right?
On Tue, Oct 28, 2008 at 7:50 PM, Scott Goodwin <[EMAIL PROTECTED]>
wrote:
Rami,
Tcl is attempting to create a new hash table entry on a hash table
that was either never created or was created but has ceased to exist
-- most likely the pointer to that hash table is null or corrupted.
This could be something in AOLserver that uses the Tcl_Hash* API.
First steps:
1. Send a copy of the nslog output for a clean startup through to
the point where it crashes; that might indicate where it's getting
fouled up. If that portion of the nslog is not very long (say no
more than 100-150 lines) you can cut and paste into the message;
otherwise attach it as a separate file (but limit it to the smallest
necessary size -- don't want multimegabyte files).
2. Do a checksum of all your own Tcl code files used by AOLserver on
a known good machine and those same Tcl files on the bad one;
compare the two outputs to see what Tcl files on the bad machine
differ from the good one. Investigate those differences.
/s.
On Oct 28, 2008, at 10:48 AM, Rami Jadaa wrote:
Hello Everyone,
We are running multiple instances of AOLserver on different
machines, and I am enjoying the reload functionality to reload the
proc libraries using ns_eval source {fileName} in each one of them...
However, one of the AOLservers crashes after few minutes from the
reload.
The strange thing is that this is the only AOLserver that crashes,
while others don't!!! and I noticed that just before the crash, the
following error happens (which means something in the C breaks, and
I am assuming that it could be in the TCL interpter, Curently tcl
8.4.16 ( not AOLserver...But this is only an assumption):
"called Tcl_CreateHashEntry on deleted table"
We use this server to serve multiple domains and have a pound
load balancer in the front , For example if the request comes for www.xyz.com
we serve xyz service related site and contents and if the request
comes for www.abc.com we serve abc related contents and site. In
total we are serving around 25 different sites like this . We are
not using any virtual hosting module or feature of Aolserver . The
total traffic of the server is not high .
Any idea anybody!!! Have anyone using the reload functionality
noticed that it could crash the AOLserver?
Environment :
Aolserver 4.0.10 , fetched from CVS almost 6 months back .
nsoracle Oracle Driver version 2.8a1
nsmysql CVS
Oracle 10gR2 Libraries
AMD x86_64 RHEL 4
Curently tcl 8.4.16 also tried tcl 8.4.11
Please help as this is driving me crazy :(
Thanks in advance
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]
> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]
> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]
> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the
Subject: field of your email blank.
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]>
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject:
field of your email blank.