Re: ad0: FAILURE - WRITE_DMA
On 8th October, Mikhail P. [EMAIL PROTECTED] reported the error: ad0: FAILURE - WRITE_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 On Sun, 10 Oct 2004, Søren Schmidt wrote: so that leaves the disks for scrutiny. One thing to try is change the tripping point where we switch from 28bit mode to 48 bit mode, could be a 1 off error in the firmware... This sounds very possible to me. I have been experiencing the same error, on a system that I've been trying to set up using 5.3-RC1 and a new 160Gbyte SATA drives My hardware is: atapci0: SiI 3112 SATA150 controller port 0xb000-0xb00f,0xac00-0xac03,0xa800-0xa807,0xa400-0xa403,0xa000-0xa007 mem 0xdf081000-0xdf0811ff irq 18 at device 11.0 on pci1 ad4: 152627MB ST3160023AS/3.18 [310101/16/63] at ata2-master SATA150 (I notice that Michail and I both have Seagate drives ...). I had problems with a filesystem on a partition which crossed the LBA=268435455 threshold. After googling and reading this thread and Søren's posting, I tried removing the filesystem and making a little 1000 sector partition which straddled the lba48 transition sector - I was able to get read and write failure messages of the above form reproducibly, by dd-ing between the test partition and /dev/zero. I edited the /usr/src/sys/dev/ata/ata-lowlevel.c file and reduced the 48-bit trigger level by one: --- ata-lowlevel.c.orig Fri Oct 29 12:06:09 2004 +++ ata-lowlevel.c Fri Oct 29 12:05:38 2004 @@ -700,7 +700,7 @@ ATA_IDX_OUTB(atadev-channel, ATA_ALTSTAT, ATA_A_4BIT); /* only use 48bit addressing if needed (avoid bugs and overhead) */ -if ((lba 268435455 || count 256) atadev-param +if ((lba 268435454 || count 256) atadev-param atadev-param-support.command2 ATA_SUPPORT_ADDRESS48) { /* translate command into 48bit version */ and built a new kernel (I'm using the stock GENERIC configuration). The resulting kernel was able to dd to and from the test partition without error. I've now created a new filesystem that uses this part of the disk and restored the contents from backup, and have been actively using the filesystem for the last day without observing any further problems. Regards, -- Neil HoggarthDepartmental Computing Manager [EMAIL PROTECTED] Laboratory of Physiology http://www.physiol.ox.ac.uk/~njh/ University of Oxford, UK ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
This sounds very possible to me. I have been experiencing the same error, on a system that I've been trying to set up using 5.3-RC1 and a new 160Gbyte SATA drives My hardware is: atapci0: SiI 3112 SATA150 controller port 0xb000-0xb00f,0xac00-0xac03,0xa800-0xa807,0xa400-0xa403,0xa000-0xa007 mem 0xdf081000-0xdf0811ff irq 18 at device 11.0 on pci1 ad4: 152627MB ST3160023AS/3.18 [310101/16/63] at ata2-master SATA150 (I notice that Michail and I both have Seagate drives ...). I had problems with a filesystem on a partition which crossed the LBA=268435455 threshold. After googling and reading this thread and Søren's posting, I tried removing the filesystem and making a little 1000 sector partition which straddled the lba48 transition sector - I was able to get read and write failure messages of the above form reproducibly, by dd-ing between the test partition and /dev/zero. The same problem with similar IDE Seagate HDD: ad0: ST3160023A/3.06 ATA-6 disk at ata0-master ad0: 152627MB (312581808 sectors), 310101 C, 16 H, 63 S, 512 B [...] ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 It had 312581808 sectors, but failed at = 268435455 : bash-2.05b# dd if=/dev/ad0 of=/dev/null bs=512 skip=268435453 dd: /dev/ad0: Input/output error 2+0 records in 2+0 records out 1024 bytes transferred in 0.163827 secs (6250 bytes/sec) bash-2.05b# dd if=/dev/ad0 of=/dev/null bs=512 skip=268435454 dd: /dev/ad0: Input/output error 1+0 records in 1+0 records out 512 bytes transferred in 0.156888 secs (3263 bytes/sec) bash-2.05b# dd if=/dev/ad0 of=/dev/null bs=512 skip=268435455 dd: /dev/ad0: Input/output error 0+0 records in 0+0 records out 0 bytes transferred in 0.149888 secs (0 bytes/sec) Decreasing the 48-bit LBA threshold by 1 really helped: bash-2.05b# dd if=/dev/ad0 bs=512 skip=312581808 0+0 records in 0+0 records out 0 bytes transferred in 0.88 secs (0 bytes/sec) bash-2.05b# dd if=/dev/ad0 bs=512 skip=312581807 1+0 records in 1+0 records out 512 bytes transferred in 0.019809 secs (25847 bytes/sec) Timestamp: 0x41826DE9 [SorAlx] http://cydem.org.ua/ ridin' VN1500-B2 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Friday 29 October 2004 16:44, [EMAIL PROTECTED] wrote: The same problem with similar IDE Seagate HDD: ad0: ST3160023A/3.06 ATA-6 disk at ata0-master ad0: 152627MB (312581808 sectors), 310101 C, 16 H, 63 S, 512 B [...] ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 Perhaps it is only Seagate - FreeBSD5-related. Same drives, but with FreeBSD4 do work well together without a glitch. regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Friday 29 October 2004 16:50, Mikhail P. wrote: Perhaps it is only Seagate - FreeBSD5-related. Same drives, but with FreeBSD4 do work well together without a glitch. Actually not only seagates.. similar happened on a 200GB Western Digital drive to me, FreeBSD-5.3. regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Fri, 29 Oct 2004, Mikhail P. wrote: On Friday 29 October 2004 16:50, Mikhail P. wrote: Perhaps it is only Seagate - FreeBSD5-related. Same drives, but with FreeBSD4 do work well together without a glitch. Actually not only seagates.. similar happened on a 200GB Western Digital drive to me, FreeBSD-5.3. In FreeBSD 5.3b7 I have the same problem with the Maxtor 120GB IDE ad2: 117246MB Maxtor 6Y120L0/YAR41BW0 [238216/16/63] at ata1-master UDMA66 ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=14301663 ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=14301663 ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=14301663 ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=14301663 ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=14301663 ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=14301663 ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=160532482 ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=209834594 ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=218490706 ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=211340046 ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=209834594 ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=163587418 ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=209834786 ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=17312287 - With best regards, |The Power to Serve Nguyen Tam Chinh| http://www.FreeBSD.org Loc: sp.cs.msu.ru | http://chinhngt.svmgu.com | http://www.gnu.org/copyleft/copyleft.html Tel: +7 905 7814187 | ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
Add Western Digital Raptors to the list as well. However I have not had a problem since 5.3-BETA3. aaron.glenn On Fri, 29 Oct 2004 16:57:33 +, Mikhail P. [EMAIL PROTECTED] wrote: On Friday 29 October 2004 16:50, Mikhail P. wrote: Perhaps it is only Seagate - FreeBSD5-related. Same drives, but with FreeBSD4 do work well together without a glitch. Actually not only seagates.. similar happened on a 200GB Western Digital drive to me, FreeBSD-5.3. regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Sunday 10 October 2004 08:59, Søren Schmidt wrote: There is definitly something fishy here, since I dont have either the disks nor any VIA chips here in the lab I cannot do any testing here. However I dont know of any problems with the VIA chips in this regard, so that leaves the disks for scrutiny. One thing to try is change the tripping point where we switch from 28bit mode to 48 bit mode, could be a 1 off error in the firmware... I apologize for bumping that old thread.. I have received both 200G drives (the ones that were giving me adX: FAILURE - WRITE_DMA on 5.2.1 system). I have plugged both drives into running 4.10 system, re-formatted them to UFS1 from sysinstall. After filling those drives with 180G of data each (files ranging in size from 10k to 1G), I did a lot of load on them (e.g. transfered data between other drives in the system, deleted random files, dd, etc) and those adX failures did not appear anymore (in fact, I'm running those drives on the file server for 5 days now, and there is no single failure/timeout so far - system has been very stable all the time on FreeBSD-4.10) On the side note - I did changes to the tripping point as suggested above and re-compiled kernel on 5.2.1 running system - disk operations dramatically decreased as expected, but number of timeouts decreased too (per dmesg - one-two timeouts in 3-4 days). I should probably also note another interesting thing - on another system with 4 hard drives (20G, 60G, 120G, 200G) where I ran RELENG_5 for the past week, timeouts and failures were appearing randomly under heavy disk writes. That system had a mix of filesystems - primary 20G drive had UFS2, and the rest of the drives were UFS1 (as they hold data, and I ran 4.7 on that system half a year ago) - data transfer between interfaces was horrible, less than 8-10mb/sec, even when system was IDLE. After re-installing system to 4.10 (no changes to hardware/etc - all remained the same apart from OS), I don't see timeouts/errors anymore, and speed of transfers between the drives got back to 20-25mb/sec, that's including that system isn't IDLE. There is also a third system with 2 x 200G ide drives and FBSD-5.2.1. Today, I had to transfer approx. 160G of data from one of the drives to another system via NFS, and unfortunately some files could not be transfered due to the same ad1 failures as above.. I'm going to mount drive in ro, to finish the transfer. regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
Martin Nilsson wrote / skrev: Something is rotten with ATA on 5.x (or I have a rotten motherboard!) I have an E7320 Lindenhurst VS 6300ESB box with 2*3GHz EM64T Xeons and 2*80GB Seagate SATA disks. Sometimes when booting the whole ATA/SATA system hangs after two READ_DMA or WRITE_DMA timeout errors. This seems to more common when running as AMD64 than i386. I can't remember any hangs after the machine have been up nicely for a couple of min. Today when starting the box with i386 RELENG_5 I got the following: ad4: TIMEOUT - WRITE_DMA LBA=4798015 ad4: TIMEOUT - WRITE_DMA LBA=146847331 panic: initiate_write_inodeblock_ufs2: already started After a reboot fsck it works nice! A verbose dmesg (from a good boot) is here: http://www.gneto.com/FreeBSD/i386-dmesg.boot I really don't know what to with this box, maybe put regular ATA or SCSI disks in it? /Martin ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Thursday 14 October 2004 19:59, Martin Nilsson wrote: I really don't know what to with this box, maybe put regular ATA or SCSI disks in it? Well, there are no problems with SCSI to my knowledge 5.3 and 5.2.1 work well on my SCSI servers.. only the ATA driver.. Would be sad to still have these problems when 5.3 goes as -STABLE.. on the other hand, I expect more people hitting that problem, and sending more debugging information, so that problem gets solved quicker. regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Sunday 10 October 2004 23:30, Mikhail P. wrote: On Saturday 09 October 2004 17:01, Mikhail P. wrote: I also got another message off-list, where author suggested to play with UDMA values. I switched from UDMA100 to UDMA66. System's uptime is 12 hours, and no timeouts so far.. but I'm quite sure they will get back in few days. 1.5 days of uptime, running in UDMA66 changes nothing. Still getting Well, now those timeouts popped up on 5.3-BETA7 system with 4 IDE drives.. They start appearing with high disk activity. System had FreeBSD-4.7 prior to that, and has been rock solid for almost a year. Drives have no problems, that's for sure (4.7 did not show up any timeouts, with uptime for months).. I don't know what to think - is ATA driver horribly broken in 5.x? regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
Mikhail P. wrote: On Sunday 10 October 2004 23:30, Mikhail P. wrote: On Saturday 09 October 2004 17:01, Mikhail P. wrote: I also got another message off-list, where author suggested to play with UDMA values. I switched from UDMA100 to UDMA66. System's uptime is 12 hours, and no timeouts so far.. but I'm quite sure they will get back in few days. 1.5 days of uptime, running in UDMA66 changes nothing. Still getting Well, now those timeouts popped up on 5.3-BETA7 system with 4 IDE drives.. They start appearing with high disk activity. System had FreeBSD-4.7 prior to that, and has been rock solid for almost a year. Drives have no problems, that's for sure (4.7 did not show up any timeouts, with uptime for months).. I don't know what to think - is ATA driver horribly broken in 5.x? Well, thats not up to me to judge I guess, but have you tried to change the tripping point for using 48Bit addressing as I suggested earlier ? I cant reproduce this problem with any of the shelfmeters of ATA gear I have here, so your help is needed or it will stay horribly broken :) -- -Søren ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
Mikhail P. wrote: Well, now those timeouts popped up on 5.3-BETA7 system with 4 IDE drives.. They start appearing with high disk activity. System had FreeBSD-4.7 prior to that, and has been rock solid for almost a year. Drives have no problems, that's for sure (4.7 did not show up any timeouts, with uptime for months).. I don't know what to think - is ATA driver horribly broken in 5.x? Something is rotten with ATA on 5.x (or I have a rotten motherboard!) I have an E7320 Lindenhurst VS ICH5R box with 2*3GHz EM64T Xeons and 2*80GB Seagate SATA disks. Sometimes when booting the whole ATA/SATA system hangs after two READ_DMA or WRITE_DMA timeout errors. This seems to more common when running as AMD64 than i386. I can't remember any hangs after the machine have been up nicely for a couple of min. The 1U box is so noisy that I can't be in the apartment at the same time without going crazy, this and that I can't reproduce it reliably effectively prevents most debugging attempts. /Martin ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Wednesday 13 October 2004 13:51, Søren Schmidt wrote: Well, thats not up to me to judge I guess, but have you tried to change the tripping point for using 48Bit addressing as I suggested earlier ? How one would do it? In BIOS? Forgive my ignorance. I cant reproduce this problem with any of the shelfmeters of ATA gear I have here, so your help is needed or it will stay horribly broken :) The 5.3-BETA7 box I was referring to is a whole different machine from the one I posted initially (2 x 200GB IDE). This machine has 4 IDE drives - 20GB Seagate 60GB IBM 120GBWDC 200GB WDC and it is P4 (CPU is 1.5Ghz, p4) motherboard. regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
I used to get that error prior to 5.3-BETA3 (5.2.1-RELEASE, and all previous 5.3-BETA's). Randomly after reboot the machine would spew about 100 of these and then hardlock. I've got two identical boxes running BETA3 and BETA7 without any issues. Intel 6300ESB controller and Western Digital Enterprise Serial ATA Raptor drives are the hardware. I thought about posting the issue, but decided against it since it was BETA 1 or BETA 2 and 5.2.1 was, honestly, nothing but pure crap. Regards, aaron.glenn On Sun, 10 Oct 2004 23:30:26 +, Mikhail P. [EMAIL PROTECTED] wrote: ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Saturday 09 October 2004 17:01, Mikhail P. wrote: I also got another message off-list, where author suggested to play with UDMA values. I switched from UDMA100 to UDMA66. System's uptime is 12 hours, and no timeouts so far.. but I'm quite sure they will get back in few days. 1.5 days of uptime, running in UDMA66 changes nothing. Still getting ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
Mikhail P. [EMAIL PROTECTED] writes: I reloaded OS on the new drives, then restored all data from the old drives. All seemed to be fine for 2 months now... but today I woke up, and noticed these messages again. A lot of them, or just one or two? Some ATA drives will spin down at regular intervals to recalibrate, and you'll get a harmless timeout if you try to write to the disk while it's doing that. DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Saturday 09 October 2004 15:01, Dag-Erling Smørgrav wrote: Mikhail P. [EMAIL PROTECTED] writes: I reloaded OS on the new drives, then restored all data from the old drives. All seemed to be fine for 2 months now... but today I woke up, and noticed these messages again. A lot of them, or just one or two? Some ATA drives will spin down at regular intervals to recalibrate, and you'll get a harmless timeout if you try to write to the disk while it's doing that. Unfortunately, all the drives (so far - four 200GB drives). I'm having the previous two drives shipped here within two weeks. Most likely these drives aren't corrupted actually.. will stress them locally here. DES regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Sat, 9 Oct 2004, Mikhail P. wrote: MP I reloaded OS on the new drives, then restored all data from the old MP drives. All seemed to be fine for 2 months now... but today I woke up, MP and noticed these messages again. MP MP A lot of them, or just one or two? Some ATA drives will spin down at MP regular intervals to recalibrate, and you'll get a harmless timeout if MP you try to write to the disk while it's doing that. MP MP Unfortunately, all the drives (so far - four 200GB drives). MP I'm having the previous two drives shipped here within two weeks. MP Most likely these drives aren't corrupted actually.. will stress them locally MP here. Well, I suppose Dag-Erling means 'lot of errors' as opposed to one or two raisen sporadically... Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
Mikhail P. [EMAIL PROTECTED] writes: On Saturday 09 October 2004 15:01, Dag-Erling Smørgrav wrote: A lot of them, or just one or two? Some ATA drives will spin down at regular intervals to recalibrate, and you'll get a harmless timeout if you try to write to the disk while it's doing that. Unfortunately, all the drives (so far - four 200GB drives). I meant a lot of timeouts, not a lot of drives. If you only get one or two timeouts per drive at regular intervals (say, once a month), they're just recalibrating and there's nothing to worry about. BTW, are you using ataidle or anything similar? DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Saturday 09 October 2004 16:23, Dag-Erling Smørgrav wrote: Mikhail P. [EMAIL PROTECTED] writes: On Saturday 09 October 2004 15:01, Dag-Erling Smørgrav wrote: A lot of them, or just one or two? Some ATA drives will spin down at regular intervals to recalibrate, and you'll get a harmless timeout if you try to write to the disk while it's doing that. Unfortunately, all the drives (so far - four 200GB drives). I meant a lot of timeouts, not a lot of drives. If you only get one or two timeouts per drive at regular intervals (say, once a month), they're just recalibrating and there's nothing to worry about. Well, there is no pattern. Often it just happens by itself - system runs 3-10 days fine (no warnings, no timeouts), and after that time I start seeing lots of these. To be more exact, for example I have user who's home dir is /home/user; user uses FTP to upload/download files under that directory. Let's say he has 5k files in total (ranging in size from 1kb to 20mb), so what happens is that when user tries to access certain files (either to continue upload, or continue download of the file), system spews lots of these timeouts and basically input/ourput error occurs. For example, yesterday it showed 360 of these messages during 12 hour period, and unfortunately during the time I was sleeping system has locked itself - last message in /var/log/messages was regarding ad0 failure. I'm not exactly sure on which files it timed out yesterday, but I do know under which directory it happened - directory has 20k files in it (not in the single dir, but including subdirs). Maybe someone knows a quick way I could open every file in under that directory - this could probably help to identify exactly on which file timeouts happened. Before replacing the drives, I had that server up for 120 days, and it did spew these messages (more and more with every day, started on about 90th day of uptime count). After rebooting system, it asked for fsck, which I did run, but it showed some softupdates inconsistencies, and refused to mount /home in rw. By the way, I just ran fsck on rw mounted /home (that's where those timeouts occurred yesterday), and I have attached it's output. I also got another message off-list, where author suggested to play with UDMA values. I switched from UDMA100 to UDMA66. System's uptime is 12 hours, and no timeouts so far.. but I'm quite sure they will get back in few days. BTW, are you using ataidle or anything similar? nope, nothing. DES regards, M. [EMAIL PROTECTED]:/usr/local/etc/rc.d fsck /home ** /dev/ad0s1g (NO WRITE) ** Last Mounted on /home ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts LINK COUNT FILE I=8715003 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715004 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715005 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715006 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715007 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715008 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715009 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715010 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715016 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715017 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715080 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715086 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715087 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715093 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715094 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715100 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715101 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715107 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715129 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715142 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715143
Re: ad0: FAILURE - WRITE_DMA
Mikhail P. wrote: Hi, This question probably has been discussed numerous times, but I'm somewhat unsure what really causes ATA failures.. I have pretty basic server here which has two IDE drives - each is 200GB. System is FreeBSD-5.2.1-p9 That server has been setup about 9 months ago, and just about 3 months ago my logs quickly filled up with: ad0: FAILURE - WRITE_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 Hmm, that means that the drive couldn't find the sector you asked for. Now, what has me wondering is that it is the exact sector where we switch to 48bit adressing mode. Anyhow, I've just checked on the old Maxtor preproduktion 48bit reference drive I have here and it crosses the limit with no problems. What controller are you using ? not all supports 48bit mode correctly.. -- -Søren ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Saturday 09 October 2004 18:26, Søren Schmidt wrote: Hmm, that means that the drive couldn't find the sector you asked for. Now, what has me wondering is that it is the exact sector where we switch to 48bit adressing mode. Anyhow, I've just checked on the old Maxtor preproduktion 48bit reference drive I have here and it crosses the limit with no problems. What controller are you using ? not all supports 48bit mode correctly.. There's VIA's motherboard (not sure about the model name). Here's info regarding ata controller from dmesg: atapci0: VIA 8235 UDMA133 controller port 0xac00-0xac0f at device 17.1 on pci0 I will be able to test the drives (the ones which I thought of as failed) on another board within 10 days or so. regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
Mikhail P. [EMAIL PROTECTED] writes: Well, there is no pattern. [...] Could be bad cables, could be bad drives. Environmental factors are a more likely cause, though. Are all the failing disks in the same machine? If they're in separate machines, are those rack-mount, or are they standing on a table or shelf? If a shelf, what kind? What's the ambient temperature in the machine room? DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: FAILURE - WRITE_DMA
On Saturday 09 October 2004 20:53, Dag-Erling Smørgrav wrote: Mikhail P. [EMAIL PROTECTED] writes: Well, there is no pattern. [...] Could be bad cables, could be bad drives. Environmental factors are a more likely cause, though. Are all the failing disks in the same machine? If they're in separate machines, are those rack-mount, or are they standing on a table or shelf? If a shelf, what kind? What's the ambient temperature in the machine room? Could be cables - I will get a replacement to verify that. I'm less sure it is drives. Yes, all 4 drives were in the same machine. Machine is a regular 2U rackmount chassis (one CPU), with proper airflow. Each drive has its individual aluminum fan as well. Chassis sits in a 47U cabinet, datacenter environment, with lots of free space around. So I'm quite sure it is not cooling/dust issues.. Well, unfortunately, I don't have access to hardware myself, so I can't do any hardware related tasks. As said, I will get those two drives shipped to me, and will then see myself if it is really hdd issue, or something else.. DES regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
ad0: FAILURE - WRITE_DMA
Hi, This question probably has been discussed numerous times, but I'm somewhat unsure what really causes ATA failures.. I have pretty basic server here which has two IDE drives - each is 200GB. System is FreeBSD-5.2.1-p9 That server has been setup about 9 months ago, and just about 3 months ago my logs quickly filled up with: ad0: FAILURE - WRITE_DMA status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=268435455 Server was still running, but I was unable to write to certain files/folders on the drive - whenever I tried to access $HOME/.fetchmailrc, for example, it wouldn't read/write the file and system would fire up a message similar to above. After couple reboots, I started getting more and more of these, and server was unusable, so I had to shut down all services and mount drives read only to backup data from the drives.. At first, I thought, this could be related to poor cooling of the parts, so drives could easily overheat in the long run. After successful backup, I purchased two new drives, with two aluminum drive fans. New drives' models were identical to the old ones - ad0 ST3200822A/3.01 ATA/ATAPI rev 6 which is Seagate's 200GB drive. I reloaded OS on the new drives, then restored all data from the old drives. All seemed to be fine for 2 months now... but today I woke up, and noticed these messages again. So now the whole situation leads me to a question - is there some issues with the ATA driver/system [or filesystem?] on FreeBSD-5.2.1? What can I do to stop these frequent failures? How do I diagnose the drives (and see whether it is really a hardware issue or something else) remotely (I don't have local access to the server - it is sitting overseas)? It seems to me that if I continue running system as now, I will have these failed drives every 1-2 months! It does not sound like a normal situation. I am running FreeBSD-5.2.1-p9, filesystem is UFS2, and all partitions [except for /] have softupdates on. Kernel is built on GENERIC, with only added ipfw options. regards, M. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]