So. It just happened again. My server crashed. This time I am sure it 
has nothing to do with the USB drive I had since it is no longer attached.

It seems to be some unfortunate timing of a kernel(?) problem and
heavy disk use.

I just suddenly get these messages in the log:

Oct 23 00:56:13 matrix kernel: [14573759.262982] ata1: link is slow to respond, 
please be patient (ready=0)
Oct 23 00:56:13 matrix kernel: [14573764.242683] ata1: device not ready 
(errno=-16), forcing hardreset
Oct 23 00:56:13 matrix kernel: [14573764.242721] ata1: soft resetting link
Oct 23 00:56:13 matrix kernel: [14573765.081129] ata1.00: configured for 
UDMA/133
Oct 23 00:56:13 matrix kernel: [14573765.081188] ata1: EH completeOct 23 
00:56:13 matrix kernel: [14573765.082422] sd 0:0:0:0: [sda] 312581808 512-byte 
hardware sectors (160042 MB)
Oct 23 00:56:13 matrix kernel: [14573765.126583] sd 0:0:0:0: [sda] Write 
Protect is off
Oct 23 00:56:53 matrix kernel: [14573765.127506] sd 0:0:0:0: [sda] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA

Which just repeat themselves until about 01:19 and then it goes quiet until a 
final logging at
7:54 where the server finally crashes (just stops to respond to network 
requests, keyboard a.s.o.)

I just checked the kern.log, which has a lot of entries of:

Oct 23 00:54:12 matrix kernel: [14573754.220270] ata1.00: exception Emask 0x0 
SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 23 00:56:13 matrix kernel: [14573754.220348] ata1.00: cmd 
ca/00:50:14:9f:8d/00:00:00:00:00/e1 tag 0 dma 40960 out
Oct 23 00:56:13 matrix kernel: [14573754.220352]          res 
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 23 00:56:13 matrix kernel: [14573754.220465] ata1.00: status: { DRDY }
Oct 23 00:56:13 matrix kernel: [14573759.262982] ata1: link is slow to respond, 
please be patient (ready=0)
Oct 23 00:56:13 matrix kernel: [14573764.242683] ata1: device not ready 
(errno=-16), forcing hardreset
Oct 23 00:56:13 matrix kernel: [14573764.242721] ata1: soft resetting linkOct 
23 00:56:13 matrix kernel: [14573765.081129] ata1.00: configured for UDMA/133
Oct 23 00:56:13 matrix kernel: [14573765.081188] ata1: EH complete
Oct 23 00:56:13 matrix kernel: [14573765.082422] sd 0:0:0:0: [sda] 312581808 
512-byte hardware sectors (160042 MB)
Oct 23 00:56:13 matrix kernel: [14573765.126583] sd 0:0:0:0: [sda] Write 
Protect is off
Oct 23 00:56:13 matrix kernel: [14573765.126598] sd 0:0:0:0: [sda] Mode Sense: 
00 3a 00 00Oct 23 00:56:53 matrix kernel: [14573765.127506] sd 0:0:0:0: [sda] 
Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

This adds some more info about an exception?

Searching for theses entries, gives a lot of people reporting the same
problem:

And probably a solution: http://ubuntuforums.org/showthread.php?t=1145513
(The guy on that post wonders why there hasn't been many reports on this 
issue...)

Also:
https://bugzilla.redhat.com/show_bug.cgi?id=462425
https://bugzilla.redhat.com/show_bug.cgi?id=404851
http://lkml.org/lkml/2008/11/9/22
http://forums.fedoraforum.org/showthread.php?t=219746

I'm running kernel 2.6.27-11-server. Someone suggest to run kernel-rt
instead:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/279693 (comment
#23)

I haven't tried that. I will try to see if a kernel 2.6.27-14 is available or 
eventually try the -rt
suggestion.

It seems it is possible to crash the system by doing a "ls -lR /". Not
what I expect from a Linux system...

Kind regards
Torben

** Bug watch added: Red Hat Bugzilla #462425
   https://bugzilla.redhat.com/show_bug.cgi?id=462425

** Bug watch added: Red Hat Bugzilla #404851
   https://bugzilla.redhat.com/show_bug.cgi?id=404851

-- 
Consistent repeating [ata1: link is slow to respond, please be patient ]
https://bugs.launchpad.net/bugs/297058
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to