>>>>> "Renata" == RENATA <[EMAIL PROTECTED]> writes:
Renata> At our lab we tend to run many jobs during the night hours in
Renata> addition to the normal heavy load of people running jobs
Renata> during the day. This has narrowed down the times during which
Renata> we can allow the weekly bos restart to run without having any
Renata> impact (re: bos setrestart). Has anyone tried turning off the
Renata> automatic bos restart that by default happens at 4:00
Renata> a.m. Sundays?
We have been running without the automatic restart for about 8 or 9
months now.
Renata> Has anyone seen problems result from turning off bos restart?
Well, yes actually. There is a bug in the AFS 3.3a vlserver/ptserver
(and possible any of the servers which speak Ubik, although I'm not
sure) which has burned us on long-running database servers.
After a few months of error-free service, in one of our cells the Ubik
master for the VLDB suddently started giving negative answers to all
queries, even though the VLDB was intact and the 3 server process had
quorum.
The impact was that about 1/3 of our dataless AFS clients crashed
completely (anything paging out of AFS died with SIGBUS). This was
not pleasant.
This is supposed to be fixed in AFS 3.4a, but the upgrade process in
a production environment is going to take a couple of months (we have
over 25 cells globally). Transarc may have a patch for this available
(we are expecting to receive said patch this week). Contact
filesystem support for more information. The TRACS number we have for
this is TR-18052.
Renata> And lastly, if there are any other sites out there in a
Renata> situation similar to ours, what has been your solution?
As a matter of policy, automatic restart is viewed as an ugly
workaround for memory leaks, and other problems which prevent
processes from running indefinetely. It is an ugly fact of life in
some cases, but avoided whenever possible.
Phil