Owen writes:
>
> We had a lot of difficulties because long jobs which were running
> during the weekly bos restarts tended to abort, and because often
> the bos server simply stopped without restarting. At first, we
> moved the weekly restart to Tuesday, so that most machines could
> be checked manually soon after restarting. Further problems arose
> because of the restarting of the three data base servers at the
> same time, so we moved those to Wednesday, Thursday, and Friday.
>
> Eventually, about a year ago, we disabled the weekly bos restart
> completely. We have not since seen any problems which we can
> definitely attribute to the lack of a weekly restart, only the
> usual problems. And these are far fewer than they were two years
> ago. One of the data base servers recently had to be rebooted
> after nine months of uninterrupted running.
>
Older versions of bosserver didn't close the rx descriptor before
running other children - that meant they inherited it.
If bosserver then did its weekly restart thing, the new copy
of bosserver would try to bind to the port that had been closed
by the old copy of bosserver. However, if any children, or
more than likely, backgrounded children of those children, were left
around, they'd still be "using" the rx port, and so the restart would
fail. This was true up to 3.3a. There were some clever ways around
this; one I like is using "csh" - it closes unused file descriptors,
so "fixes" the problem. However, as of 3.4a (& 3.4?) there is a new
solution: bosserver, or rather rx, is now smart enough to mark the file
descriptor "close upon exec", when means children won't inherit the
descriptor, and so restarts will work a lot more smoothly.
Long running jobs that abort sounds like a different problem.
I'm not sure what was happening there, or if that's still a
problem. Although, it does seem unlikely that bosserver would
handle long running bos cron jobs smoothly across restarts;
that would require it to keep dynamic information around
between restarts, and the whole point of a bosserver restart
is to get rid of all the dynamic state information.
-Marcus