Hi there!

I have had our gentoo server go down twice in under two days. I am
currently trying to figure out what is happening.

Facts:
- Dual PIII 933 MHz system (ServerWorks OSB4)
- 3.5GB RAM
- 2.6.11.2-grsec-20050614 kernel (self rolled)
- SCSI: Adaptec AIC-7892P, 32MB cache
+ Disks
 + For Operating System
  - 2x IBM DDYS-T09170N SCSI U160 10KRPM 9.1GB in a RAID1, 1x of the
 same for hotspare
 + For storage etc
  - 3x IBM IC35L036UWD210-0 SCSSI U160 10KRPM
  - 1x IBM DDYS-T36950N SCSI U160 10KRPM
  - In a RAID5

Tuesday afternoon, I was informed that there might be problems with this
server. I had just been working on it via shell. I went back, and found
it unresponsive.

I went into the server room, only to catch it ending a reboot and being
almost totally back up. It behaved the rest of the day. I was not able
to find any indications of problems in the logs.

Wednesday evening, I was again working on the system via ssh, and it
stopped responding. I got into the server room fast enough this time. I
tried to log in as root, and could not. I could type the username, but
upon hitting enter, nothing happened. That was true for any console.

I have syslogd output *.* to console 10, so flipping over there, I saw
nothing out of the ordinary. The last long, at the time I noticed it
stop responding, was a simple run-of-the-mill firewall log.

After a few more minutes, the system was completely unresponsive, save
for SysReq. I Synced, tErmed, Synced again, remounted everything
read-only and forced it to reboot.

Again I was not able to find any logs indicating any errors at all.

The only two possibilities I see is that I was goofing with samba at
various points, both days. However, samba was not running at either time
the system went down.

The other, more interesting one, is that at both times when the system
went down, I was creating a tar.bz2 out of a kernel source. The problems
happened well after I had started them.

Wondering about disks, I threw smartctl -a at both of the arrays (sda ,
sdb), which didn't give anything out of the ordinary.

However when I run smartctl -t offline or -t short or -t long on sda or
sdb, it immediately fails on STDOUT. This I find odd, because I have
done these tests in the past. Granted it was on a different kernel,
which I no longer have around.

Here is an example:

# smartctl -t short /dev/sda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short Background Self Test Failed

Looking at logs, I don't see anything strange. Including dmesg.

I am worried by the smartctl results, however I realize there is a small
possibility that it's due to kernel changes.

Any ideas out there? Thank you for reading this! I *LOVE* Gentoo in
production.
-- 
[email protected] mailing list

Reply via email to