We're running into problems with deploying new fileservers. We've had
intermittant "watchdog reset" and, more recently reports of failed
writes on the external disks. We currently have 5 machines
identically configured.
If anyone has experienced similiar problems, please let me know how
(or if) the problems were resolved. If you're running fine with a
similar configuration, I'd also like to know.
Pleaes reply to me and I'll summarize.
Thanks,
Walter
a. Watchdog reset
The watchdog resets occur intermittantly and do NOT seem to be load
related. 'ctrace' from the console shows the procedure that it died in
was _idlework. The other thing is the 'PC' value from '.registers' is
1 and not a valid address.
3 out of the 5 machines have had this error but only two have been
consistent about it. Replacing the CPU board on the most frequent
offender seems to have helped so perhaps we got a run of bad hardware.
b. Disk errors
Of the 20 external disks, only 8 have NOT reported something along the
lines of:
Dec 11 02:17:08 vice3 vmunix: sd4c: Error for command 'write'
Dec 11 02:17:08 vice3 vmunix: sd4c: Error Level: Fatal
Dec 11 02:17:08 vice3 vmunix: sd4c: Block 1735392, Absolute Block: 1735392
Dec 11 02:17:08 vice3 vmunix: sd4c: Sense Key: Media Error
Dec 11 02:17:08 vice3 vmunix: sd4c: Vendor 'SEAGATE' error code: 0x12
The disks are Seagate ST32550WC from Sun, labeled and formatted as
their standard 2.1GB drive in the UniPack enclosure.
c. Configuration summary
. Sparc 20/71 w/48MB memory
. 1 - internal 2.1GB Seagate Hawk drive (from Sun)
. 4 - external 2.1GB Seagate Barracuda drive (ST32550WC rev 412)
. 1 - Fast/SCSI SBus card (2 external disks attached to this card)
. SunOS 4.1.3_U1B w/Sun recommended patches
. AFS 3.3a dedicated fileserver