Linus Torvalds wrote:
> ...
> 
> Because the low-level filesystems _have_ already re-tried. So there's no
> point in the MD device doing the same thing. Once a low-level device has
> an error, we've done all the retries it's sane to do (sometimes a lot
> more), and MD retrying would only make error recovery slower by
> multiplying the retry-time by some number.

My sincere apologies, that may be what is supposed to happen, but it is
not what I am seeing happen.  Direct access to device /dev/sda or 
partition /dev/sda1, similar to what the md device does, doesn't appear to 
be retried in the few tests I have run on slightly flaky hardware.

Specifically, I have a set of old magneto-optical cartridges and a drive 
that gets flaky when used for too long. (It recovers if allowed to cool 
off. Magnificent for testing error handling but only so-so for actually
getting work done, but that's another story.) The error codes returned 
and more importantly the system log show only a single retry. (The 
ancient hardware did no retries for the error I was seeing as far as I 
can tell.) A single retry at application level recovered the data in 
approximately 95% of the errors. (Another 7 retries added only a 
little to the over all success rate so you're absolutely right about
extra retries generally being a waste of time.)

After building the data recovery program for the opticals, I wondered 
about retries and read the scsi, sd and host driver code to find out 
why I wasn't seeing them. I found that the magnificent recovery edifice 
in the SCSI protocol layer was not used. The much simpler error 
recovery in the sd driver seemed to be the only error recovery 
actually used and it did no retries. 

I'm reading the code over for the third or fourth time now and I'm 
still a little confused why this is happening. Does the low level 
(host?) driver have to do something to make error recovery work?

> Done the same thing twice doesn't make it better.

Of course you're right, but I'm not sure it's being done even once.

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]

Reply via email to