This message is from the T13 list server.


Don,

Yes, I did consider all of your points.

The ATA standard provides no way to disable auto reallocation - or anything
else related to defect management (look it up).  And while you can disable
SMART, you cannot disable any vendor specific functionality that is being
triggered by running into these READ errors (since they are, afterall,
vendor specific).

Since you cannot control auto reallocation, and have no idea of what the
defect management is doing, you cannot use READ/WRITE LONG for testing any
of this.  Except the extreme test of introducing a lot of errors and then
running out of spares, making the drive non functional.  Similarly, since
you cannot access auto-reallocation or anything to do with it, you cannot
reverse an auto-reallocation.  Finally, this use of READ/WRITE LONG has all
of the same problems even before the commands were made obsolete.  It is
lack of access to the defect management system and lack of knowing what the
drive does internally when it encounters these "artificial" errors that is
the root cause of the problem.

Anyone can make any product they want, including one that does not comply
with the ATA standard.  And if you have a proprietary relationship with a
supplier that allows you to avoid all of these issues, then more power to
you.

But absent that, only knowing that the drive is ATA compliant, you should
not be doing any of this stuff.  Especially in a reliability sensitive
application, like RAID systems.

So yes, I did consider all of your points.

Jim


-----Original Message-----
From: don clay [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 21, 2002 6:22 PM
To: ata reflector
Subject: Re: RE: [t13] RAID and R/W LONG


This message is from the T13 list server.


Jim,

Did you ever consider the possibility that people who would want to use R/W 
Long might, just might, turn off auto-reallocation and SMART when they are 
using those commands?  That maybe, they would really be using this to
actually 
test the drive?  And that further, they might be able to also use R/W Long
to 
test auto-reallocation?  And stretching things even further, maybe they have
a 
way to reverse an auto-reallocation?  Just so that they can make sure that
auto-
reallocation works the way that they intended.

And that maybe, they were doing this before R/W Long was obsoleted because 
they made a business decision to do so.  And, in a clear violation of the 
STANDARD, they ( and who knows how many others) decided to continue 
using it because it made sense for their company.  And maybe their customer 
insisted that they have it.

Just wondering if you considered this.



3/21/02 6:08:22 PM, "McGrath, Jim" <[EMAIL PROTECTED]> wrote:

>This message is from the T13 list server.
>
>
>
>Harlan,
>
>I agree (as I stated in the section of my message that you did not quote).
>However, that is a result of the drive error rate specification.  If you
are
>not careful you can return data that is in error without an error status
>(what we call a "buffer miscompare").  These actually will occur (it is the
>drive miscorrection error specification), but the rate is specified by
>vendors to be so low that you should never see it under normal use.
>
>However, none of this has to do with the ATA standard per se.  The ATA
>standard is entirely silent (as far as I can see) on the topic of defect
>management, and auto reallocation in particular.  Indeed, you don't even
>need to do defect management to be ATA compliant (some early ATA drives
>relied on the host to handle defects).
>
>So you should not start inducing errors via WRITE LONGs and assume the 
drive
>will somehow sort it all out - at least not for a drive that just obeys the
>normal error rate and ATA standards.  Of course a specific product may work
>fine in this case, and you could always specify this behavior in a purchase
>specification (indeed, some customers do put defect management constraints
>into their specifications).  But absent that, the ATA standard as written
>does not insure that it will work properly.  
>
>Running out of spares is actually the least of the worries.  Suppose you
>corrupt a lot of sectors, and then read them back (triggering errors)?  You
>could trip all sorts of internal (and external) signals in the drive
causing
>side effects.  SMART triggers have been pointed out as one (a READ of a
>sector that was corrupted with a WRITE LONG MUST be logged as an error,
>since the READ reported an error - 8.51.6.8.2.4 of ATA-6).  Another could
be
>lowering drive performance (i.e. we could try and slow things down in an
>attempt to reduce the number of "excessive" errors we are seeing).
>Basically the drive thinks its failing, and so may end up doing a number of
>otherwise undesirable things in order to "save" the data.
>
>This is especially dangerous since a lot of the drive READ/WRITE LONG
>implementations have probably been static for a long time, and drives
acting
>smarter in data reliability issues is more recent.
>
>If you are using READ LONG/WRITE LONG in a controlled testing 
environment,
>then this is probably not an issue.  But using it for a field feature is
>dangerous if you just rely on the ATA standard.
>
>Jim
>
>-----Original Message-----
>From: Harlan Andrews [mailto:[EMAIL PROTECTED]]
>Sent: Thursday, March 21, 2002 4:40 PM
>To: McGrath, Jim; '[EMAIL PROTECTED]'; [EMAIL PROTECTED]
>Subject: RE: [t13] RAID and R/W LONG
>
>
>>To my knowledge once a drive decides to reallocate, that is a non
>reversible
>>decision - you just used up a spare sector on the drive.  Do that often
>>enough and the drive will fail (there are a limited number of spares).
>
>Jim,
>
>I repeat:  
>
>Auto-relocation MUST not take place until valid data is available.  
>The non-recovered error should go into the "Pending" list (waiting for a 
>write or a recovered read).   Then, when the write occurs, the sector 
>from the "Pending" list should be tested first before re-assignment.   
>WriteLong should NEVER cause re-assignment.
>
>When a "Pending" entry becomes available, there is a TEST of that block 
>BEFORE relocation.  This prevents the relocation of "good" media.
>
>WriteLong should NEVER cause re-assignment.    WriteLong does NOT waste 
>spare blocks.
>
>...Harlan
>
>
>---------------- Begin Forwarded Message ----------------
>Date:        3/21/02 3:06 PM
>Received:    3/21/02 4:05 PM
>From:        McGrath, Jim, [EMAIL PROTECTED]
>To:          '[EMAIL PROTECTED]', [EMAIL PROTECTED]
>             [EMAIL PROTECTED]
>
>This message is from the T13 list server.
>
>
>
>Raymond,
>
>You don't understand how auto reallocate works.  It has nothing to do with
>error reporting.
>
>When a drive thinks that the media in question is suspect, it "auto
>reallocates" the data to another portion of media.  If the data was
>readable, then the data is moved at that point.  If not, then the drive
>remembers that the media is suspect and writes the data to the new section
>of media when it gets the next write command.
>
>The drives decision may be correlated to reporting an error to the host, 
>but
>may not be.  As an example, a drive could be performing a background scan 
>of
>the media during idle time, run into that sector, and at that time 
>determine
>that the media is suspect.  The key is that none of this is standardized.
>
>To my knowledge once a drive decides to reallocate, that is a non 
>reversible
>decision - you just used up a spare sector on the drive.  Do that often
>enough and the drive will fail (there are a limited number of spares).
>
>Jim
>
>
>-----Original Message-----
>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>Sent: Thursday, March 21, 2002 2:12 PM
>To: [EMAIL PROTECTED]
>Subject: RE: [t13] RAID and R/W LONG
>
>
>This message is from the T13 list server.
>
>
>Logically, the drive should not auto-reallocate when they encounter a read
>error, otherwise, the host might read a junk data and get "good status"
>back.  It is not desirable but acceptable to get a read error (that is why
>people use RAID to prevent that), but it is not acceptable that the drive
>output the wrong data and tell the host it is good.  This is data 
>corruption
>(instead of data error).
>
>Raymond Liu
>
>-----Original Message-----
>From: McGrath, Jim [mailto:[EMAIL PROTECTED]]
>Sent: Thursday, March 21, 2002 1:40 PM
>To: '[EMAIL PROTECTED]'; [EMAIL PROTECTED]
>Subject: RE: [t13] RAID and R/W LONG
>
>
>This message is from the T13 list server.
>
>
>
>The issue on auto reallocation may be that some implementations would auto
>reallocate on the subsequent READ of the sector.  The drive has no way of
>knowing that this is a "good" sector that you artificially forced an error
>into.  In general the details of auto reallocation policy are all vendor
>specific.
>
>Jim
>
>
>-----Original Message-----
>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>Sent: Thursday, March 21, 2002 12:13 PM
>To: [EMAIL PROTECTED]
>Subject: RE: [t13] RAID and R/W LONG
>
>
>This message is from the T13 list server.
>
>
>Creat a false uncorrectable error is only done in the very beginning of
>using the drive as RAID1 rebuild target drive (and only if necessary, i.e.
>only when the source drive has reported an unrecoverable data block).  It
>might affect the statistical data the drive collected a little bit (only 
>the
>drive guys can answer this).  Auto-relocation should not be affected 
>because
>this is not a normal write error. 
>
>Raymond Liu
>
>-----Original Message-----
>From: Hale Landis [mailto:[EMAIL PROTECTED]]
>Sent: Thursday, March 21, 2002 10:02 AM
>To: T13 List Server
>Subject: [t13] RAID and R/W LONG
>
>
>This message is from the T13 list server.
>
>
>On Thu, 21 Mar 2002 09:18:13 -0800, [EMAIL PROTECTED] wrote:
>>This message is from the T13 list server.
>>[...] you might implement
>>vendor specific commands to "address" that 
>>(which will keep the R/W Long
>>still formally in "obsolete" state)? 
>
>Raymond, I think I asked a few days ago, but could you explain in
>detail why/how you are using R/W LONG? Do you expect the command to
>actually be passed to a drive behind a RAID controller or is the
>command executed directly and only by the RAID controller? If the
>command is used to create a false uncorrectable error on a real
>drive, how do you then adjust for the possible effects on the drive's
>SMART data or the drives auto-relocation function?
>
>
>
>*** Hale Landis *** www.ata-atapi.com ***
>
>----------------- End Forwarded Message -----------------
>
>
>


Reply via email to