James Bottomley wrote:
On Thu, 2005-02-17 at 14:27 +1000, Douglas Gilbert wrote:

Recent SPC-3 and SBC-2 drafts treat the sense keys of
MEDIUM ERROR and HARDWARE ERROR in a similar way.
Both can return an "info" field which has the same
meaning (lba of first failure). The distinction is that
MEDIUM ERROR is a little more precise (at least for
magnetic rotating media) **. For flash ram the distinction
is moot.


My copy of SPC-3 (r21d) still defined HARDWARE ERROR in Table 27 as

HARDWARE ERROR: Indicates that the device server detected a non-
recoverable hardware failure
(e.g., controller failure, device failure, or parity error) while
performing the command or during a self
test.

which looks pretty non-retryable to me ... where does it say that the
error might be retryable?

James, The definition of MEDIUM ERROR from the same table: "Indicates that the command terminated with a non-recoverable error condition that may have been caused by a flaw in the medium or an error in the recorded data. This sense key may also be returned if the device server is unable to distinguish between a flaw in the medium and a specific hardware failure (i.e. sense key 4h)". Sense key "4h" is HARDWARE ERROR.

I interpret that as SPC-3 saying MEDIUM ERROR and
HARDWARE ERROR may both report non-recoverable errors.
Also note that MEDIUM ERROR, HARDWARE ERROR and RECOVERED
ERROR can return an "actual retry count" in their additional
sense data.

SBC-2 (rev 16) makes little distinction between
the two sense keys for "unrecovered read errors": table 4 shows
either can be used. It also says on page 19: "When
an unrecovered read error is reported the information field
of the sense data shall contain the LBA of the unrecovered
logical block."

Nothing that I can see links an "unrecovered (read) error" with
the application client retrying the same command in either draft.
If "actual retry count" is > 1 in the sense key specific field
then that implies the device has already tried several times.

SSC-3 (for tape drives) also allows MEDIUM ERROR or HARDWARE ERROR
to indicate an unrecovered read error (rev 1c, table 2). For tape
drives, retrying the same command is probably not appropriate. [I
note that st and sg set their 'max_retries' to 0 to inhibit this.]
MMC-5 only mentions the HARDWARE ERROR sense key for a self
diagnostic failure.

This analysis leads me to question why retries are instigated
from the mid level and not the sd driver (and perhaps sr driver
as well). If so, sd should not instigate retries if the device
indicates a reasonable number of retries have already taken
place, unless it can change some other factor or is instructed by
some parameter to sd.


As Alan Stern points out, my patch fails the reality test. The device in question obviously required a retry when it returned a HARDWARE ERROR sense key (but perhaps the reason was not an unrecovered error or it was not reported properly).

Doug Gilbert


- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Reply via email to