Oh yes, replaced the power supplies (which are redundant) replaced the
memory, the motherboard, everything, literally, except the case.  We started
with one or two pieces, and kept replacing more and more until we just sent
out another server and swapped hard drives.  When that didn't work, we even
did backups, replaced the entire server with a new server directly from the
factory (read-a different batch of servers) and restored the entire backup.

All of this was to no avail, as the systems eventually reboot, even after
all these swap-outs.

-----Original Message-----
From: Ed Esgro [mailto:[EMAIL PROTECTED]] 
Sent: Wednesday, March 20, 2002 9:05 AM
To: NT 2000 Discussions
Subject: RE: Spontaneous Reboots on Dell w/Win2K with NO TRACE! Absolute M
yste ry!

Does the server have redundant power supplies? Have you tried replacing them
at all? Have you tried changing the UPS that the server is on?

-----Original Message-----
From: Szlucha, Chris [mailto:[EMAIL PROTECTED]] 
Sent: Wednesday, March 20, 2002 9:01 AM
To: NT 2000 Discussions
Subject: Spontaneous Reboots on Dell w/Win2K with NO TRACE! Absolute Myste
ry!

Ok, here is something that we've been working on that has gone all the way
up to Michael Dell himself that I'd like some input on from you guys.

Has anyone seen spontaneous reboots on Dell systems where there is
absolutely no trace left anywhere in either the Windows environment nor the
hardware environment (Dell ESM logs)?  Dell's "top engineers" and 5 of us
here at the SEC have been working on it for literally 3 months, almost every
day, to no avail.

Here's the configuration:
Hardware:
Dell 2550
Dual PIII 1133 Mhz (BIOS v A05)
2 GB RAM
PERC 3 PCI RAID Controller
4 72 GB Fujitsu Hard Drives
Intel 8255x-based Integrated Fast Ethernet NIC
DRAC-II card
External PowerVault 128T LTO Tape Library connected via Adaptec AIC-7899 PCI
SCSI card



Software:
Windows 2000 Server w/ SP2
Terminal Services for remote admin
Veritas Backup Exec v8.6
Remotely Anywhere
Dell Server Agents as follows-
Dell OpenManage Server Agent v. 4.3.0 (BLD_2922)  
DRAC-II Server Monitoring SNMP MIB Agent v. 2.0, Firmware v. 2.40  
Dell OpenManage Array Manager v. 3.0
Network Associates NetShield 4.5, current engine and DATs
Executive Software Network Undelete v. 2
WQuinn Associates Storage CeNTral v.4.1 build 461

We use these servers only for file and print serving with no other funny
software installed and no "unnecessary" services running.  All flash-able
components have been flashed to the current level and drivers are
up-to-date.  And during the installation of Veritas Backup Exec, we have the
Veritas drivers installed for the backup devices.

These servers reboot at random and leave no trace in the event logs, nothing
in the hardware logs about any hardware issues.  There is no blue screen and
no Dr. Watson events, no system dumps, literally NOTHING to trace this to
anything or give us any indication as to where to start looking.

We have picked apart our build process, which BTW works absolutely perfectly
on a Compaq server, and Dell has even taken one of our rebooting systems
back to their labs for analysis, again to no avail.

The failure rate for us was somewhere around 75-80% on these machines.  It
seemed for a while to be hardware, as we could sometimes replace the
motherboard and memory and have the systems work again.  But then we had
repeat performances of the reboots.  Systems will reboot sometimes
immediately, sometimes they run for a month and a half before rebooting.  We
have stress-tested these systems using 2 or 3 different stress test
packages, and these reboots haven't replicated in the lab but once.

This is a real head-scratcher.  Any thoughts?  And remember, the easy things
have more than likely already been thought of and tried, but I'm willing to
entertain any ideas (and so is Dell at this point).

Thanks all!
-Chris

------
You are subscribed as [EMAIL PROTECTED]
Archives: http://www.swynk.com/sitesearch/search.asp
To unsubscribe send a blank email to %%email.unsub%%

------
You are subscribed as [EMAIL PROTECTED]
Archives: http://www.swynk.com/sitesearch/search.asp
To unsubscribe send a blank email to %%email.unsub%%

------
You are subscribed as [email protected]
Archives: http://www.swynk.com/sitesearch/search.asp
To unsubscribe send a blank email to [EMAIL PROTECTED]

Reply via email to