RE: [t13] RAID and R/W LONG

McGrath, Jim Thu, 21 Mar 2002 17:00:34 -0800

This message is from the T13 list server.

Harlan,

I agree (as I stated in the section of my message that you did not quote).
However, that is a result of the drive error rate specification.  If you are
not careful you can return data that is in error without an error status
(what we call a "buffer miscompare").  These actually will occur (it is the
drive miscorrection error specification), but the rate is specified by
vendors to be so low that you should never see it under normal use.

However, none of this has to do with the ATA standard per se.  The ATA
standard is entirely silent (as far as I can see) on the topic of defect
management, and auto reallocation in particular.  Indeed, you don't even
need to do defect management to be ATA compliant (some early ATA drives
relied on the host to handle defects).

So you should not start inducing errors via WRITE LONGs and assume the drive
will somehow sort it all out - at least not for a drive that just obeys the
normal error rate and ATA standards.  Of course a specific product may work
fine in this case, and you could always specify this behavior in a purchase
specification (indeed, some customers do put defect management constraints
into their specifications).  But absent that, the ATA standard as written
does not insure that it will work properly.  

Running out of spares is actually the least of the worries.  Suppose you
corrupt a lot of sectors, and then read them back (triggering errors)?  You
could trip all sorts of internal (and external) signals in the drive causing
side effects.  SMART triggers have been pointed out as one (a READ of a
sector that was corrupted with a WRITE LONG MUST be logged as an error,
since the READ reported an error - 8.51.6.8.2.4 of ATA-6).  Another could be
lowering drive performance (i.e. we could try and slow things down in an
attempt to reduce the number of "excessive" errors we are seeing).
Basically the drive thinks its failing, and so may end up doing a number of
otherwise undesirable things in order to "save" the data.

This is especially dangerous since a lot of the drive READ/WRITE LONG
implementations have probably been static for a long time, and drives acting
smarter in data reliability issues is more recent.

If you are using READ LONG/WRITE LONG in a controlled testing environment,
then this is probably not an issue.  But using it for a field feature is
dangerous if you just rely on the ATA standard.

Jim

-----Original Message-----
From: Harlan Andrews [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 21, 2002 4:40 PM
To: McGrath, Jim; '[EMAIL PROTECTED]'; [EMAIL PROTECTED]
Subject: RE: [t13] RAID and R/W LONG

>To my knowledge once a drive decides to reallocate, that is a non
reversible
>decision - you just used up a spare sector on the drive.  Do that often
>enough and the drive will fail (there are a limited number of spares).

Jim,

I repeat:  

Auto-relocation MUST not take place until valid data is available.  
The non-recovered error should go into the "Pending" list (waiting for a 
write or a recovered read).   Then, when the write occurs, the sector 
from the "Pending" list should be tested first before re-assignment.   
WriteLong should NEVER cause re-assignment.

When a "Pending" entry becomes available, there is a TEST of that block 
BEFORE relocation.  This prevents the relocation of "good" media.

WriteLong should NEVER cause re-assignment.    WriteLong does NOT waste 
spare blocks.

...Harlan

---------------- Begin Forwarded Message ----------------
Date:        3/21/02 3:06 PM
Received:    3/21/02 4:05 PM
From:        McGrath, Jim, [EMAIL PROTECTED]
To:          '[EMAIL PROTECTED]', [EMAIL PROTECTED]
             [EMAIL PROTECTED]

This message is from the T13 list server.

Raymond,

You don't understand how auto reallocate works.  It has nothing to do with
error reporting.

When a drive thinks that the media in question is suspect, it "auto
reallocates" the data to another portion of media.  If the data was
readable, then the data is moved at that point.  If not, then the drive
remembers that the media is suspect and writes the data to the new section
of media when it gets the next write command.

The drives decision may be correlated to reporting an error to the host, 
but
may not be.  As an example, a drive could be performing a background scan 
of
the media during idle time, run into that sector, and at that time 
determine
that the media is suspect.  The key is that none of this is standardized.

To my knowledge once a drive decides to reallocate, that is a non 
reversible
decision - you just used up a spare sector on the drive.  Do that often
enough and the drive will fail (there are a limited number of spares).

Jim

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 21, 2002 2:12 PM
To: [EMAIL PROTECTED]
Subject: RE: [t13] RAID and R/W LONG

This message is from the T13 list server.

Logically, the drive should not auto-reallocate when they encounter a read
error, otherwise, the host might read a junk data and get "good status"
back.  It is not desirable but acceptable to get a read error (that is why
people use RAID to prevent that), but it is not acceptable that the drive
output the wrong data and tell the host it is good.  This is data 
corruption
(instead of data error).

Raymond Liu

-----Original Message-----
From: McGrath, Jim [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 21, 2002 1:40 PM
To: '[EMAIL PROTECTED]'; [EMAIL PROTECTED]
Subject: RE: [t13] RAID and R/W LONG

This message is from the T13 list server.

The issue on auto reallocation may be that some implementations would auto
reallocate on the subsequent READ of the sector.  The drive has no way of
knowing that this is a "good" sector that you artificially forced an error
into.  In general the details of auto reallocation policy are all vendor
specific.

Jim

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 21, 2002 12:13 PM
To: [EMAIL PROTECTED]
Subject: RE: [t13] RAID and R/W LONG

This message is from the T13 list server.

Creat a false uncorrectable error is only done in the very beginning of
using the drive as RAID1 rebuild target drive (and only if necessary, i.e.
only when the source drive has reported an unrecoverable data block).  It
might affect the statistical data the drive collected a little bit (only 
the
drive guys can answer this).  Auto-relocation should not be affected 
because
this is not a normal write error. 

Raymond Liu

-----Original Message-----
From: Hale Landis [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 21, 2002 10:02 AM
To: T13 List Server
Subject: [t13] RAID and R/W LONG

This message is from the T13 list server.

On Thu, 21 Mar 2002 09:18:13 -0800, [EMAIL PROTECTED] wrote:
>This message is from the T13 list server.
>[...] you might implement
>vendor specific commands to "address" that 
>(which will keep the R/W Long
>still formally in "obsolete" state)? 

Raymond, I think I asked a few days ago, but could you explain in
detail why/how you are using R/W LONG? Do you expect the command to
actually be passed to a drive behind a RAID controller or is the
command executed directly and only by the RAID controller? If the
command is used to create a false uncorrectable error on a real
drive, how do you then adjust for the possible effects on the drive's
SMART data or the drives auto-relocation function?

*** Hale Landis *** www.ata-atapi.com ***

----------------- End Forwarded Message -----------------

RE: [t13] RAID and R/W LONG

Reply via email to