Hi Patrick,
Thanks for your mail.
I think there isn't a swap problem, I have 1Gb. I will try mersenne and then
tell you.

I have tracked the time when my server is unavailable and compared with the
aolserver logs, I found some coincidence in this:
/var/log/messages   here is the time between the last access and the restart
signal

Jul 23 21:27:48 localhost su(pam_unix)[1103]: session opened for user root by
nsa\
dmin(uid=502)
Jul 23 21:46:21 localhost syslogd 1.4-0: restart.


then in my log file of one of the services is this:
[23/Jul/2001:21:18:26 -0400] "GET /uptime.txt HTTP/1.0" 200 7 \
"" ""
[23/Jul/2001:21:27:28][987.1026][-sched-] Notice: Running scheduled proc
wd_mail_\
errors...
[23/Jul/2001:21:27:28][987.1026][-sched-] Notice: Looking for errors...
[23/Jul/2001:21:55:10][1039.1024][-main-] Notice: nsmain: AOLserver/3.2+ad12
star\
ting
[23/Jul/2001:21:55:10][1039.1024][-main-] Notice: nsmain: security info:
uid=502,\
 euid=502, gid=501, egid=501


check our the times:
Jul 23 21:27:48 last linux box log
[23/Jul/2001:21:27:28] last server log

This specific server is running openacs, and using nsd76 (that's because it
has no problems with spanish caracters).

I have seen other coincidence with this service and the last access time, any
suggestions??

Thank you,
Rocael.



Patrick Giagnocavo <[EMAIL PROTECTED]> wrote:
Hi,

I can think of two cases that you might want to check:

1.  You don't have enough swap space, and when scheduled procs run during
the middle of the night you run out of swap and the machine dies.

2.  Could it be a sometimes-bad RAM module?  To check this, get the Mersenne
prime tester program from www.mersenne.org.  Ignore the setup part, just get
it to run the tests.  It will heavily stress your RAM and the CPU's cache
and memory interface.  Note:  it will use all your available CPU and a few
megs of RAM; but it will only use CPU when all other processes are not.

The only other thing I could suggest would be to run a cron script that
grabs a web page every 10 minutes or so and emails the result to you.  Get
the nstelemetry.adp file from aolserver.com and then set up your cron script
to grab the page via lynx and email it to you.  Then you will have a chance
at catching the error shortly before it occurs, or at least getting some
useful diagnostics.

Cordially

Patrick Giagnocavo
[EMAIL PROTECTED]
OpenACS Hosting:  www.zill.net

Reply via email to