Bill,
Thanks for your reply. We have done some more investigation and have
determined that the problem is with sendbackup. It does ufsdump | sed |
ufsrestore. When this starts it takes the CPU to 100% and stays there. The
performance monitoring soon quits updating. Log messages indicate that
sendmail sees the load average too high and quits processing the queue. The
only recovery is to turn the machine off and back on.

The data on the largest partition was slightly greater that 1 GB. We had 2
holding partitions, each slightly less than 1 GB. We tried combining the 2
partitions with DiskSuite to get a larger volume, but this did not fix the
problem.

The only patch on the web site for 2.4.2p2 seems to be for IRIS and TRU64,
not Solaris.

Eva Freer

-----Original Message-----
From: Bill Carlson [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 15, 2001 10:16 AM
To: Eva Freer
Cc: [EMAIL PROTECTED]
Subject: Re: Solaris 8 Server hangs during backup


On Tue, 14 Aug 2001, Eva Freer wrote:

> We have a highly subnetted configuration of Solaris 8 and 2.6 boxes,
mostly
> E220R's. The subnets are connected via firewalls. Each subnet has its own
> Amanda server with an Exabyte Mammoth tape drive. We use hardware
> compression only. The Amanda is 2.4.2p1 on most nodes.
>
> Originally, we seemed to have a problem with only one subnet, with a
Solaris
> 2.6 server, 2 Solaris clients, and 1 Solaris 8 client. The server would
hang
> during the backup and required a poweroff reboot. Part of the backup would

!?!
I've never seen anything with amanda that actually killed the machine. A
heavily overloaded machine will seem dead, but should eventually respond.

> The problem now affects at least 2 of the subnets. In both cases, the
Amanda
> server is Solaris 8 with 1 Solaris 8 client and 2 Solaris 2.6 clients. One
> server hangs every night while the other is intermittent. Both are
> configured to use 2 ~1 GB holding partitions. Eliminating the holding
> partitions did not prevent the hangup. The largest disk backed up contains
> slightly more than the capacity of 1 of the holding partitions. The server

How full is the largest partition? For holding disk purposes, the
important part is how much actual data you have, not the size of the
filesystem.

> than the usual OS stuff. The 2.6 clients are dual processor Sun E220R
> webservers with no activity during the backup period. The 8 client and
> server are single processor E220R LDAP servers with no activity during the
> backup period. Perfmeter analysis indicates that the CPU usage goes to
100%
> shortly after the backup starts and stays there.

Do you have debug turned on for all clients and servers? The first thing
I'd want to see is the debug output and then the actual logs. When the CPU
starts spinning at 100%, what process is the culprit? We need more info
here. Are you using ufsdump or tar? Any patches to amanda?

Bill Carlson
--
Systems Programmer    [EMAIL PROTECTED]  | Anything is possible,
Virtual Hospital      http://www.vh.org/      | given time and money.
University of Iowa Hospitals and Clinics      |
Opinions are mine, not my employer's.         |


Reply via email to