Re: Transient disk failures not retried?

Dan Jones Wed, 17 Jan 2001 15:58:01 -0800
Max TenEyck Woodbury wrote:
> 
> Dan Jones wrote:
> >
> > Max TenEyck Woodbury wrote:
> >> ...
> >>
> >> My impression is that no error recovery is being tried
> >> for transient failures. Could someone confirm this?
> >> If this is correct, what changes are need to be made
> >> to correct the problem?
> >>
> >
> > Well, as I imagine you already know, your basic SCSI
> > setup is marginal.
> 
> OK. I said there was a problem myself, but I also said
> that it seems to have been fixed or at least improved
> recently...
> 
> > I base this on your recitation of problems i.e. both
> > the spare and drives being taken offline and the fact you are
> > using drive trays. In fact, the trays are probably causing
> > most (all?) of your problems. Do you really need them?
> 
> Sorry, but the system would not even be running without the drive
> trays. Without the extra fans (and temperature monitors), the
> drives run HOT (> 45 C). With the trays, the monitors show the
> drives running within temperature specs. (29, 35, 36.5 and 37
> at this moment; all less than specs limit of 40.) I could just
> use extra fans, but how would I know if they were effective
> without the monitors and air plenum.
> 
> > If you really need them, it is possible that reducing the maximum
> > SCSI transfer rate might help in your case.
> 
> Might be OK, but it doesn't seem to be the real problem. How?
> 
> > But, this is all speculation. What are the SCSI errors that cause
> > the drive(s) to be taken offline? Can't really tell what is going
> > on without the SCSI status.
> 
> Jan 16 04:03:37 oscar kernel: scsi1: MEDIUM ERROR on channel 0, id 0, lun 0,
>     CDB: Read (10) 00 04 42 00 90 00 00 08 00
> Jan 16 04:03:37 oscar kernel: Info fld=0x4420097, Current sd08:21:
>     sense key Medium Error
> Jan 16 04:03:37 oscar kernel: Additional sense indicates Unrecovered read error
> Jan 16 04:03:37 oscar kernel: scsidisk I/O error: dev 08:21, sector 71434352
> 
> Which tells me that this is a drive problem, not a bus problem.
> Ten years of experience with hard drives and SCSI and the behavior
> of this particular drive tells me that this is a marginal drive that
> needs to be watched like a hawk, but it has not yet turned the corner
> and become unusable and may not do so for weeks or years.
> 
> The RAID driver then kicks the drive out of the array rather than
> instituting normal read recovery procedures, which is what I complained
> about.
> 

Well, let's see, you have a hard (unrecovered) read error followed by
a SCSI parity error (see below). So, you seem to have both a marginal 
drive and a marginal SCSI bus. Perhaps this example is unrelated to your 
original request for additional error recovery for transient errors. In my 
experience, being able to continue from a single, recoverable SCSI error 
is reasonable, but recovery from two SCSI errors, of any kind, is not.
As you indicated, you are trying to make do with the equipment you have,
but that doesn't change the natural laws. Since you need the trays for 
cooling, one option left is to reduce the SCSI transfer rate and give it
a try. The maximum rate is normally a setting in the adapter's SCSI BIOS.

> Jan 16 04:03:37 oscar kernel: raid5: Disk failure on sdc1, disabling device.
>     Operation continuing on 2 devices
> Jan 16 04:03:37 oscar kernel: md: updating md0 RAID superblock on device
> Jan 16 04:03:37 oscar kernel: sdf1 [events: 0000006e](write)
>     sdf1's sb offset: 35839872
> Jan 16 04:03:37 oscar kernel: md: recovery thread got woken up ...
> Jan 16 04:03:37 oscar kernel: md0: resyncing spare disk sdf1 to
>     replace failed disk
> Jan 16 04:03:37 oscar kernel: RAID5 conf printout:
> ...
> 
> After that there seems to be a sym53c895 driver problem with a
> lot of messages about aborted commands and register dumps that
> might be a bus problem but looks more like a SNAFU in the low and
> mid-level driver error handling...
> 
> Jan 16 04:03:37 oscar kernel: raid5: restarting stripe 71434352
> Jan 16 04:03:37 oscar kernel: sym53c895-0:3: ERROR (0:18) (0-20-0) (1f/9f)
>     @ (script bc0:18000a00).
> Jan 16 04:03:37 oscar kernel: sym53c895-0: script cmd = 18000000
> Jan 16 04:03:37 oscar kernel: sym53c895-0:
>     regdump: da 10 80 9f 47 1f 03 0b 56 00 83 20 80 01 08 01.
> Jan 16 04:03:37 oscar kernel: sym53c895-0: resetting, command processing
>     suspended for 2 seconds
> Jan 16 04:03:37 oscar kernel: sym53c895-0: restart (scsi reset).
> Jan 16 04:03:37 oscar kernel: sym53c895-0: enabling clock multiplier
> Jan 16 04:03:37 oscar kernel: sym53c895-0: Downloading SCSI SCRIPTS.
> Jan 16 04:03:40 oscar kernel: sym53c895-0: command processing resumed
> Jan 16 04:03:40 oscar kernel: sym53c895-0-<3,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:40 oscar kernel: sym53c895-0-<2,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:40 oscar kernel: sym53c895-0-<1,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:40 oscar kernel: sym53c895-0-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:40 oscar kernel: sym53c895-0-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:40 oscar kernel: sym53c895-0-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:40 oscar kernel: sym53c895-0:3: ERROR (0:48) (0-20-0)
>     (1f/9f) @ (script be0:18000600).
> Jan 16 04:03:40 oscar kernel: sym53c895-0: script cmd = 18000000
> Jan 16 04:03:40 oscar kernel: sym53c895-0:
>     regdump: da 10 80 9f 47 1f 03 0b 35 00 81 20 80 01 08 09.
> Jan 16 04:03:40 oscar kernel: sym53c895-0: resetting, command processing
>     suspended for 2 seconds
> Jan 16 04:03:40 oscar kernel: sym53c895-0: restart (scsi reset).
> Jan 16 04:03:40 oscar kernel: sym53c895-0: enabling clock multiplier
> Jan 16 04:03:40 oscar kernel: sym53c895-0: Downloading SCSI SCRIPTS.
> Jan 16 04:03:43 oscar kernel: sym53c895-0: command processing resumed
> Jan 16 04:03:43 oscar kernel: sym53c895-0-<3,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:43 oscar kernel: sym53c895-0-<2,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:43 oscar kernel: sym53c895-0-<1,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:43 oscar kernel: sym53c895-0-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:43 oscar kernel: sym53c895-0-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:43 oscar kernel: sym53c895-0-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:47 oscar kernel: scsi1 channel 0 :
>     resetting for second half of retries.
> Jan 16 04:03:47 oscar kernel: SCSI bus is being reset for host 1 channel 0.
> Jan 16 04:03:47 oscar kernel: sym53c8xx_reset: pid=7654664 reset_flags=1
>     serial_number=0 serial_number_at_timeout=0
> Jan 16 04:03:47 oscar kernel: sym53c895-0: resetting, command processing
>     suspended for 2 seconds
> Jan 16 04:03:47 oscar kernel: scsi1: device driver called scsi_done()
>     for a syncronous reset.
> Jan 16 04:03:47 oscar kernel: sym53c895-0: restart (scsi reset).
> Jan 16 04:03:47 oscar kernel: sym53c895-0: enabling clock multiplier
> Jan 16 04:03:47 oscar kernel: sym53c895-0: Downloading SCSI SCRIPTS.
> Jan 16 04:03:47 oscar kernel: sym53c895-0: command processing resumed
> Jan 16 04:03:47 oscar kernel: sym53c895-0-<3,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:47 oscar kernel: sym53c895-0-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:47 oscar kernel: sym53c895-0-<1,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:47 oscar kernel: sym53c895-0-<2,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:03:47 oscar kernel: sym53c895-0-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:47 oscar kernel: sym53c895-0-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:03:47 oscar kernel: SCSI disk error : host 1 channel 0 id 3 lun 0
>     return code = 18000002
> Jan 16 04:03:47 oscar kernel: Info fld=0x60, Current sd08:51: sense key
>     Aborted Command
> Jan 16 04:03:47 oscar kernel: Additional sense indicates Scsi parity error
                                                           ^^^^^^^^^^^^^^^^^
> Jan 16 04:03:47 oscar kernel: scsidisk I/O error: dev 08:51, sector 64
> Jan 16 04:03:47 oscar kernel: interrupting MD-thread pid 6
> Jan 16 04:03:47 oscar kernel: raid5: Disk failure on spare sdf1
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654761, scsi1, channel 0, id 3, lun 0 Write (6) 00 0a a0 80 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654761 serial_number=7654872
>     serial_number_at_timeout=7654872
> Jan 16 04:04:17 oscar kernel: sym53c895-0: abort ccb=fffffc0003aaa800 (cancel)
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654760, scsi1, channel 0, id 3, lun 0 Write (6) 00 0a 20 80 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654760 serial_number=7654871
>     serial_number_at_timeout=7654871
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654759, scsi1, channel 0, id 3, lun 0 Write (6) 00 09 a0 80 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654759 serial_number=7654870
>     serial_number_at_timeout=7654870
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654758, scsi1, channel 0, id 3, lun 0 Write (6) 00 09 20 80 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654758 serial_number=7654869
>     serial_number_at_timeout=7654869
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654757, scsi1, channel 0, id 3, lun 0 Write (6) 00 08 a0 80 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654757 serial_number=7654868
>     serial_number_at_timeout=7654868
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654767, scsi1, channel 0, id 3, lun 0 Write (6) 00 0b 20 30 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654767 serial_number=7654878
>     serial_number_at_timeout=7654878
> Jan 16 04:04:17 oscar kernel: sym53c895-0: abort ccb=fffffc00002eb000 (cancel)
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654756, scsi1, channel 0, id 3, lun 0 Write (6) 00 08 20 80 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654756 serial_number=7654895
>     serial_number_at_timeout=7654895
> Jan 16 04:04:17 oscar kernel: sym53c895-0: abort ccb=fffffc0003aa4800 (cancel)
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654785, scsi1, channel 0, id 2, lun 0 Read (10) 00 03 5c 0e 10 00 00 08 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654785 serial_number=7654897
>     serial_number_at_timeout=7654897
> Jan 16 04:04:17 oscar kernel: sym53c895-0: abort ccb=fffffc0002e37800 (cancel)
> Jan 16 04:04:17 oscar kernel: scsi : aborting command due to timeout :
>     pid 7654784, scsi1, channel 0, id 1, lun 0 Read (10) 00 03 5c 0e 10 00 00 08 00
> Jan 16 04:04:17 oscar kernel: sym53c8xx_abort: pid=7654784 serial_number=7654896
>     serial_number_at_timeout=7654896
> Jan 16 04:04:17 oscar kernel: sym53c895-0: abort ccb=fffffc0002226000 (cancel)
> Jan 16 04:04:20 oscar kernel: SCSI host 1 abort (pid 7654761) timed out - resetting
> Jan 16 04:04:20 oscar kernel: SCSI bus is being reset for host 1 channel 0.
> Jan 16 04:04:20 oscar kernel: sym53c8xx_reset: pid=7654761 reset_flags=2
>     serial_number=7654872 serial_number_at_timeout=7654872
> Jan 16 04:04:20 oscar kernel: sym53c895-0: resetting, command processing
>     suspended for 2 seconds
> Jan 16 04:04:20 oscar kernel: sym53c895-0: restart (scsi reset).
> Jan 16 04:04:20 oscar kernel: sym53c895-0: enabling clock multiplier
> Jan 16 04:04:20 oscar kernel: sym53c895-0: Downloading SCSI SCRIPTS.
> Jan 16 04:04:20 oscar kernel: sym53c895-0: command processing resumed
> Jan 16 04:04:20 oscar kernel: sym53c895-0-<3,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:04:20 oscar kernel: sym53c895-0-<1,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:04:20 oscar kernel: sym53c895-0-<2,*>: WIDE SCSI (16 bit) enabled.
> Jan 16 04:04:20 oscar kernel: sym53c895-0-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:04:20 oscar kernel: sym53c895-0-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:04:20 oscar kernel: sym53c895-0-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s
>     (25 ns, offset 31)
> Jan 16 04:04:20 oscar kernel: SCSI disk error : host 1 channel 0 id 3 lun 0
>     return code = 18000002
> Jan 16 04:04:20 oscar kernel: Info fld=0xa20, Current sd08:51: sense key
>     Aborted Command
> Jan 16 04:04:20 oscar kernel: Additional sense indicates Scsi parity error
> Jan 16 04:04:20 oscar kernel: scsidisk I/O error: dev 08:51, sector 2560
> 
> Followed by a bug report from RAID...
> 
> Jan 16 04:04:20 oscar kernel: interrupting MD-thread pid 6
> Jan 16 04:04:20 oscar kernel: md: bug in file raid5.c, line 659
> Jan 16 04:04:20 oscar kernel:
> Jan 16 04:04:20 oscar kernel:        **********************************
> Jan 16 04:04:20 oscar kernel:        * <COMPLETE RAID STATE PRINTOUT> *
> Jan 16 04:04:20 oscar kernel:        **********************************
> Jan 16 04:04:20 oscar kernel: md0: <sdf1><sdc1><sde1><sdd1> array superblock:
> Jan 16 04:04:20 oscar kernel:   SB: (V:0.90.0) 
>ID:<a0a3d082.d9b7f185.6121a0f1.0a06a2d3>
>     CT:3a39677e
> Jan 16 04:04:20 oscar kernel:      L5 S35839872 ND:5 RD:3 md0 LO:2 CS:131072
> Jan 16 04:04:20 oscar kernel:      UT:3a640e69 ST:0 AD:2 WD:2 FD:3 SD:0 CSUM:00007310
>     E:0000006e
> Jan 16 04:04:20 oscar kernel:      D  0:  DISK<N:0,sdc1(8,33),R:0,S:1>
> Jan 16 04:04:20 oscar kernel:      D  1:  DISK<N:1,sdd1(8,49),R:1,S:6>
> Jan 16 04:04:20 oscar kernel:      D  2:  DISK<N:2,sde1(8,65),R:2,S:6>
> Jan 16 04:04:20 oscar kernel:      D  3:  DISK<N:3,[dev 00:00](0,0),R:3,S:9>
> Jan 16 04:04:20 oscar kernel:      D  4:  DISK<N:4,sdf1(8,81),R:4,S:1>
> Jan 16 04:04:20 oscar kernel:      D  5:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:20 oscar kernel:      D  6:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:20 oscar kernel:      D  7:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:20 oscar kernel:      D  8:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:20 oscar kernel:      D  9:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      D 10:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      D 11:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      THIS:  DISK<N:2,sde1(8,65),R:2,S:6>
> Jan 16 04:04:22 oscar kernel:  rdev sdf1: O:sdf1, SZ:00000000 F:1 DN:4 no rdev 
>superblock!
> Jan 16 04:04:22 oscar kernel:  rdev sdc1: O:sdc1, SZ:00000000 F:1 DN:0 no rdev 
>superblock!
> Jan 16 04:04:22 oscar kernel:  rdev sde1: O:sde1, SZ:35839872 F:0 DN:2 rdev 
>superblock:
> Jan 16 04:04:22 oscar kernel:   SB: (V:0.90.0) 
>ID:<a0a3d082.d9b7f185.6121a0f1.0a06a2d3>
>     CT:3a39677e
> Jan 16 04:04:22 oscar kernel:      L5 S35839872 ND:5 RD:3 md0 LO:2 CS:131072
> Jan 16 04:04:22 oscar kernel:      UT:3a640e69 ST:0 AD:2 WD:3 FD:2 SD:1 CSUM:0000b202
>     E:0000006e
> Jan 16 04:04:22 oscar kernel:      D  0:  DISK<N:0,sdc1(8,33),R:0,S:1>
> Jan 16 04:04:22 oscar kernel:      D  1:  DISK<N:1,sdd1(8,49),R:1,S:6>
> Jan 16 04:04:22 oscar kernel:      D  2:  DISK<N:2,sde1(8,65),R:2,S:6>
> Jan 16 04:04:22 oscar kernel:      D  3:  DISK<N:3,[dev 00:00](0,0),R:3,S:9>
> Jan 16 04:04:22 oscar kernel:      D  4:  DISK<N:4,sdf1(8,81),R:4,S:0>
> Jan 16 04:04:22 oscar kernel:      D  5:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      D  6:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      D  7:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      D  8:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      D  9:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      D 10:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      D 11:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:22 oscar kernel:      THIS:  DISK<N:2,sde1(8,65),R:2,S:6>
> Jan 16 04:04:22 oscar kernel:  rdev sdd1: O:sdd1, SZ:35839872 F:0 DN:1 rdev 
>superblock:
> Jan 16 04:04:22 oscar kernel:   SB: (V:0.90.0) 
>ID:<a0a3d082.d9b7f185.6121a0f1.0a06a2d3>
>     CT:3a39677e
> Jan 16 04:04:24 oscar kernel:      L5 S35839872 ND:5 RD:3 md0 LO:2 CS:131072
> Jan 16 04:04:24 oscar kernel:      UT:3a640e69 ST:0 AD:2 WD:3 FD:2 SD:1 CSUM:0000b1f0
>     E:0000006e
> Jan 16 04:04:24 oscar kernel:      D  0:  DISK<N:0,sdc1(8,33),R:0,S:1>
> Jan 16 04:04:24 oscar kernel:      D  1:  DISK<N:1,sdd1(8,49),R:1,S:6>
> Jan 16 04:04:24 oscar kernel:      D  2:  DISK<N:2,sde1(8,65),R:2,S:6>
> Jan 16 04:04:24 oscar kernel:      D  3:  DISK<N:3,[dev 00:00](0,0),R:3,S:9>
> Jan 16 04:04:24 oscar kernel:      D  4:  DISK<N:4,sdf1(8,81),R:4,S:0>
> Jan 16 04:04:24 oscar kernel:      D  5:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:24 oscar kernel:      D  6:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:24 oscar kernel:      D  7:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:24 oscar kernel:      D  8:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:24 oscar kernel:      D  9:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:24 oscar kernel:      D 10:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:24 oscar kernel:      D 11:  DISK<N:0,[dev 00:00](0,0),R:0,S:0>
> Jan 16 04:04:24 oscar kernel:      THIS:  DISK<N:1,sdd1(8,49),R:1,S:6>
> Jan 16 04:04:24 oscar kernel:        **********************************
> Jan 16 04:04:24 oscar kernel:
> Jan 16 04:04:24 oscar kernel:  <SPARE FAILED!>
> Jan 16 04:04:24 oscar kernel:  <6>md0: spare disk sdf1 failed, skipping to next 
>spare.
> ...
> 
> > ...
> >
> > I don't think that a hardware RAID would help in this case,
> > although in a marginal system any change can have surprising
> > effects. I take the same line regarding mixing drives as the
> > RAID vendors, but it is more of a configuration matrix problem
> > than one of functionality. I just can't test every possible
> > combination, so I can't make a blanket recommendation except to
> > recommend against. Having said that, I have never experienced
> > any problems mixing drives on Mylex RAID controllers.
> 
> At present, the three drives that make up the array are
> identical with the spare being just slightly smaller and
> slower. The problem is that I may need to replace a drive
> a year or two into the arrays lifetime and identical drives
> may no longer be available. The local rep for the hardware
> RAID vendor told me that all the drives had to be the same
> or the warantee was void. He may have been talking through
> his hat, but the lack of support was enough to throw the
> discussion to the software RAID solution.
> 
> However, these are all side issues to the question about
> attempting the normal retry process before kicking the drive
> out of the array.

-- 
Dan Jones, Manager, Storage Products          VA Linux Systems
V:(510)687-6737 F:(510)683-8602               47071 Bayside Parkway
[EMAIL PROTECTED]                            Fremont, CA 94538
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
Re: Transient disk failures not retried?

Reply via email to