Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-27 Thread Matthew Seaman
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 27/06/2010 24:04:48, Matthew Lear wrote: Incidentally, is there a way to easily migrate from a atacontrol created array to a gmirror created array? I'm running FreeBSD 8.0 on another machine with a gmirror created RAID1 array with no problem

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-27 Thread Matthew Lear
On Sun, 2010-06-27 at 09:36 +0100, Matthew Seaman wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 27/06/2010 24:04:48, Matthew Lear wrote: Incidentally, is there a way to easily migrate from a atacontrol created array to a gmirror created array? I'm running FreeBSD 8.0 on another

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-26 Thread Matthew Lear
On Fri, 2010-06-25 at 00:16 -0700, Jeremy Chadwick wrote: All in all, replacing a drive is a completely reasonable action when there's evidence confirming the need for its replacement. I don't like replacing hardware when there's no indication replacing it will necessarily fix the problem;

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-26 Thread Jeremy Chadwick
On Sat, Jun 26, 2010 at 04:57:48PM +0100, Matthew Lear wrote: On Fri, 2010-06-25 at 00:16 -0700, Jeremy Chadwick wrote: All in all, replacing a drive is a completely reasonable action when there's evidence confirming the need for its replacement. I don't like replacing hardware when

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-26 Thread Matthew Lear
On Sat, 2010-06-26 at 10:12 -0700, Jeremy Chadwick wrote: On Sat, Jun 26, 2010 at 04:57:48PM +0100, Matthew Lear wrote: On Fri, 2010-06-25 at 00:16 -0700, Jeremy Chadwick wrote: All in all, replacing a drive is a completely reasonable action when there's evidence confirming the need

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-25 Thread Jeremy Chadwick
On Thu, Jun 24, 2010 at 05:22:41PM -0500, Adam Vande More wrote: Haven't followed the entire thread, but wanted to point out something important to remember. SMART is not a reliable indicator of failure. It's certainly better than listening to it but it picks up less than 1/2 of drive

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-24 Thread Matthew Lear
On Tue, 2010-06-22 at 20:04 +0100, Bob Bishop wrote: Hi, On 22 Jun 2010, at 08:45, Jeremy Chadwick wrote: On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew Lear wrote: [tale of woe elided] I don't really have any other thoughts on the matter, sadly. [helpful suggestions elided]

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-24 Thread Jeremy Chadwick
On Thu, Jun 24, 2010 at 06:52:14PM +0100, Matthew Lear wrote: On Tue, 2010-06-22 at 20:04 +0100, Bob Bishop wrote: Hi, On 22 Jun 2010, at 08:45, Jeremy Chadwick wrote: On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew Lear wrote: [tale of woe elided] I don't really have any

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-24 Thread Matthew Lear
On Thu, 2010-06-24 at 11:15 -0700, Jeremy Chadwick wrote: On Thu, Jun 24, 2010 at 06:52:14PM +0100, Matthew Lear wrote: On Tue, 2010-06-22 at 20:04 +0100, Bob Bishop wrote: Hi, On 22 Jun 2010, at 08:45, Jeremy Chadwick wrote: On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-24 Thread Adam Vande More
Haven't followed the entire thread, but wanted to point out something important to remember. SMART is not a reliable indicator of failure. It's certainly better than listening to it but it picks up less than 1/2 of drive failures. Google released a study of their disks in data centers a few years

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-22 Thread Jeremy Chadwick
On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew Lear wrote: Hello Jeremy. I just wondered if you had any further thoughts on the info below. Two new disks arrived over the weekend and I'm still unsure if I'm best to replace ad0 or not... Much appreciated indeed. -- Matt On Fri,

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-22 Thread Bob Bishop
Hi, On 22 Jun 2010, at 08:45, Jeremy Chadwick wrote: On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew Lear wrote: [tale of woe elided] I don't really have any other thoughts on the matter, sadly. [helpful suggestions elided] Anyone else have ideas/recommendations? The disks sure look

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-21 Thread Matthew Lear
Hello Jeremy. I just wondered if you had any further thoughts on the info below. Two new disks arrived over the weekend and I'm still unsure if I'm best to replace ad0 or not... Much appreciated indeed. -- Matt On Fri, 2010-06-18 at 20:28 +0100, Matthew Lear wrote: On Fri, 2010-06-18 at 10:42

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-19 Thread Andriy Gapon
on 18/06/2010 20:42 Jeremy Chadwick said the following: http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting I've always read IDNF to mean OS requested access (read or write) to an LBA which is out of bounds, where out of bounds means not between 0 and last LBA. How exactly

7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Matthew Lear
Hi there, I'm running 7.2-RELEASE-p4 on an i386 HP server (ML G5) in RAID1 configuration. Very recently, I've seen IO errors such as: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=20472527 reported and the RAID mirror is now offline. ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left)

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Pieter de Boer
Hi Matthew, I'm running 7.2-RELEASE-p4 on an i386 HP server (ML G5) in RAID1 configuration. Very recently, I've seen IO errors such as: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=20472527 reported and the RAID mirror is now offline. ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left)

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Jeremy Chadwick
On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote: Hi there, I'm running 7.2-RELEASE-p4 on an i386 HP server (ML G5) in RAID1 configuration. Very recently, I've seen IO errors such as: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=20472527 reported and the RAID mirror

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Miroslav Lachman
Jeremy Chadwick wrote: On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote: [...] The drives in the RAID exist on two seperate ATA channels: [r...@meshuga /home/matt]# atacontrol list ATA channel 0: Master: ad0WDC WD3200AAKS-00VYA0/12.01B02 SATA revision 2.x Slave:

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Jeremy Chadwick
On Fri, Jun 18, 2010 at 01:36:53PM +0200, Miroslav Lachman wrote: Jeremy Chadwick wrote: On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote: [...] The drives in the RAID exist on two seperate ATA channels: [r...@meshuga /home/matt]# atacontrol list ATA channel 0: Master:

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Alexander Motin
Jeremy Chadwick wrote: On Fri, Jun 18, 2010 at 01:36:53PM +0200, Miroslav Lachman wrote: Jeremy Chadwick wrote: On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote: [...] The drives in the RAID exist on two seperate ATA channels: [r...@meshuga /home/matt]# atacontrol list ATA

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Matthew Lear
Hello Jeremy, Thanks very much for the feedback. [snip] Could you please provide the full output from smartctl -a /dev/ad0 here? Your drive may be completely fine and you may not have to swap it at all; hard to say. Sure. See below: smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 7.2-RELEASE-p4

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Jeremy Chadwick
On Fri, Jun 18, 2010 at 04:47:11PM +0100, Matthew Lear wrote: Hello Jeremy, Thanks very much for the feedback. [snip] Could you please provide the full output from smartctl -a /dev/ad0 here? Your drive may be completely fine and you may not have to swap it at all; hard to say. Sure.

Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Matthew Lear
On Fri, 2010-06-18 at 10:42 -0700, Jeremy Chadwick wrote: On Fri, Jun 18, 2010 at 04:47:11PM +0100, Matthew Lear wrote: Hello Jeremy, Thanks very much for the feedback. [snip] Could you please provide the full output from smartctl -a /dev/ad0 here? Your drive may be completely