Max TenEyck Woodbury wrote:
> 
> I have an APX system that I am using as a file server.
> The data drive is a RAID5 construct of 3 36 GB 10K rpm
> drives and a 36 GB 7200 rpm spare drive. The drives are
> in a separate enclosure with extra fans and each drive
>  is in a swap tray with a temperature monitor and extra
> fan. (I think the disk failure rate in this building
> is a bit higher than it should be and one of the causes
> may have been drive overheating.) The machine has its
> own UPS. (We installed UPS on all machines in the
> building because the disk failure rate was definitely
> too high at that point. Power company changes have also
> improved building power considerably about two years
> ago.)
> 
> >From time to time (about 3 times a week) one of the
> drives (always the same drive) gets kicked out of the
> array. Originally, this brought in a spare drive. The
> reconstruction used to cause a massive flood of bus
> errors and hang the system. As a result, I took the
> spare off-line.
> 
> SCSI bus is LVD with active termination. One terminator
> is provided by the Symbios 53c895 card and the other is
> on the port where SCSI bus exits the drive enclosure.
> Internal reflections have been minimized by wrapping
> the ribbon cable in the enclosure in bubble wrap and
> placing another layer of bubble wrap on any flat
> metallic surfaces where the cable could make contact.
> (This was done when the enclosure was first installed.)
> 
> A problem with the swap trays was found. The 10K
> drives are barely small enough to fit in the trays.
> On another system, the contact between the bottom of
> the swap tray and the circuit board would cause bus
> errors, some times on the drive with the problem and
> sometimes on other drives. A single layer of tape on
> the inside of the bottom of the swap tray solved that
> problem. I was finally able to take the server down
> and apply the same treatment to it the end of last
> week. This eliminated the avalanche of bus errors, but
> the spare still gets kicked out before the
> reconstruction can finish.
> 
> System software is Red Hat Linux 6.2 for Alpha.
> Host interface is Symbios 53c895. (There is also an
> Adaptec AHA-294x/AIC-7871 host for the system and
> CD drives.) Primary drives are 3 Quantum ATLAS 10K
> 36WLS. Spare drive is a Quantum ATLAS V 36WLS.
> 
> My impression is that no error recovery is being tried
> for transient failures. Could someone confirm this?
> If this is correct, what changes are need to be made
> to correct the problem?
> 

Well, as I imagine you already know, your basic SCSI setup is
marginal. I base this on your recitation of problems i.e. both
the spare and drives being taken offline and the fact you are
using drive trays. In fact, the trays are probably causing most 
(all?) of your problems. Do you really need them? If you really
need them, it is possible that reducing the maximum SCSI transfer 
rate might help in your case. 

But, this is all speculation. What are the SCSI errors that cause 
the drive(s) to be taken offline? Can't really tell what is going
on without the SCSI status.

> (No, a hardware RAID solution is NOT an option. I
> have been told that all the drives in a hardware
> RAID system have to be functionally equivalent
> right down to the same revision of the firmware.
> Since it was difficult to get authorization for
> the system in the first place, I could NOT get
> more than three drives to start with. The fact
> that soft-RAID could handle a mixture of different
> drives was the factor that allowed the project
> to be implemented at all.)
> 

I don't think that a hardware RAID would help in this case,
although in a marginal system any change can have surprising
effects. I take the same line regarding mixing drives as the
RAID vendors, but it is more of a configuration matrix problem
than one of functionality. I just can't test every possible
combination, so I can't make a blanket recommendation except to
recommend against. Having said that, I have never experienced 
any problems mixing drives on Mylex RAID controllers. 

-- 
Dan Jones, Manager, Storage Products          VA Linux Systems
V:(510)687-6737 F:(510)683-8602               47071 Bayside Parkway
[EMAIL PROTECTED]                            Fremont, CA 94538
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]

Reply via email to