Your message dated Wed, 7 Oct 2009 22:11:38 +0200 with message-id <[email protected]> and subject line Re: linux-image-2.6.26-1-686: Additional info and still occuring has caused the Debian Bug report #462229, regarding sata - sata_nv - sata link fails on heavy load to be marked as done.
This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact [email protected] immediately.) -- 462229: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=462229 Debian Bug Tracking System Contact [email protected] with problems
--- Begin Message ---Package: base Severity: critical Justification: causes serious data loss -- System Infomation: Dabian Release: etch APT prefers: stable APT policy: (1001, 'stable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.18 customized Locale: LANG=de_AT.UTF-8, LC_CTYPE=de_AT.UTF-8 (charmap=UTF-8) Motherboard: ASUS M2NPV-MX Chipset: NFORCE-MCP51, chipset revision 161 libata version 2.00 sata_nv 0000:00:0e:0: version 2.0 I encountered two strange problems concerning my SATA-drives. Chapter I) One SAMSUNG SP084N PATA drive (/) [hda] One SAMSUNG SP2004C Rev: VM10 / 05 SATA drive (payload) [sda] sata1 -> Using LVM2 (2.02.06-4) on non / partitions On every boot I have this message, but I think this is only showing there is no more drive attached?!? If so it is a little confusing ... ----------- ata2: SATA link down (SStatus 0 SContorl 300) ATA: abnormal status 0x7F on port 0x977 Vendor: ATA Model SP2004C Type: Direct-Access ANSI SCSI reversion: 05 ----------- sda1 (LVM) is used by samba. As I tried to restore the data (about 90 GB) via network (GBit) from a windows backup client to the new debian server, 33% were copied without problems. Then problems occurred: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata1.00: (BMDMA stat 0x20) ata1.00: tag 0 cmd 0x35 Emask 0x1 stat 0x51 err 0x4 (device error) ata1: EH complete ----------- ata1.00: soft resetting port ata1.00: limiting speed to UDMA/66 ata1.00: configured for UDMA/66 ata1.00: sd 0:0:0:0: SCSI error: return code = 0x08000002 ------------ ata1.00: end_request: I/O error, dev sda, sector 31464335 ata1.00: printk: 127 messages suppressed ata1.00: Buffer I/O error on device dm-6, logical block 3932986 ata1.00: lost page write due to I/O error on dm-6 sata1: EH complete ------------ sata1.00 speed down requested but no transfer mode left The transfer rate went down to 0,01 kb/secs, and the filesystem was unrepairable destroyed. I tried this three times (new fs etc. etc.). After the third attempt I was able to repair the fs and I let the current data on the drive because i thought the drive is corrupted on these certian places. After this I was able to copy all the data, no more problems occurred on this day, but a few days after, the same situations came out. Because of these problemes I 'went' to Chapter II) One SAMSUNG SP084N PATA drive (/) [hda] One SAMSUNG SP2004C Rev: VM10 / 05 SATA drive (payload) [sda] sata1 -> Using LVM2 (2.02.06-4) on non / partitions One SEAGATE ST3250410AS Rev: 3.AA /05 [sdb] sata2 One SEAGATE ST3250410AS Rev: 3.AA /05 [sdc] sata3 -> sdb1 und sdc2 in a RAID1-Array (without LVM) On every boot I have this message, but I think this is only showing there is no more drive attached?!? I so it is a little confusing ... compare with Chapter I) ----------- ata4: SATA link down (SStatus 0 SContorl 300) ATA: abnormal status 0x7F on port 0x967 Vendor: ATA Model: ST3250410AS Rev: 3.AA Type: Direct-Acess ANSI SCSI revision: 05 ----------- So I copied all the formerly backuped data from hda and a windows backup client to the new created raid1-array (90 GB). Everything went fine, at least with sdb and sdc. But I got these messages in the time I copied the data form hda and network to /dev/md0 (sdb and sbc). ----------- ata1: port is slow to respond, please be patient ata1: soft resetting port ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB) sda: Write Protect is off SCSI device sda: drive cache: write through ata1: port is slow to respond, please be patient ata1: soft resetting port ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB) sda: Write Protect is off SCSI device sda: drive cache: write through ----------- ATTENTION!!! sda was not involved in this 'thing' it was only mounted (there were no open files). Therefore the were also no data losses. So it seems that sata_nv (or maybe the mainboard) has a problem with one (?) of the sata ports on heavy load. Again: all situations came out on heavy loads (copying from multiple sources to 'one' destination). Otherwise I could not understand why I get these errors on sata1/sda without doing something on it?!? Regards, Anton Huber -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
--- End Message ---
--- Begin Message ---On Tue, Sep 08, 2009 at 10:44:03PM +0200, Moritz Muehlenhoff wrote: > On Sat, Feb 07, 2009 at 09:25:12PM +0000, Ben Whyte wrote: > > Package: linux-image-2.6.26-1-686 > > Version: 2.6.26-13 > > Followup-For: Bug #462229 > > > > > > While doing disk rights I can achieve the following > > > > It is consistent across all disks, all ports, all cables. I have been > > seeing this issue since june/july and it has cost me significant data > > loss and forced 3 reinstalls as the OS has been terminally damaged. > > > > I have tried turning write cache off as it has been mentioned as a > > pottential fix this has not worked. > > > > Currently effecting 2 brand new wd 1 tb green drives. > > Did you try a more recent kernel than the standard Lenny kernel, e.g. a > 2.6.30 kernel from backports.org? No further feedback, closing the bug. If anyone reencounters the problem more a recent kernel, please reopen. Cheers, Moritz
--- End Message ---

