This message is from the T13 list server.
Harlan, I agree (as I stated in the section of my message that you did not quote). However, that is a result of the drive error rate specification. If you are not careful you can return data that is in error without an error status (what we call a "buffer miscompare"). These actually will occur (it is the drive miscorrection error specification), but the rate is specified by vendors to be so low that you should never see it under normal use. However, none of this has to do with the ATA standard per se. The ATA standard is entirely silent (as far as I can see) on the topic of defect management, and auto reallocation in particular. Indeed, you don't even need to do defect management to be ATA compliant (some early ATA drives relied on the host to handle defects). So you should not start inducing errors via WRITE LONGs and assume the drive will somehow sort it all out - at least not for a drive that just obeys the normal error rate and ATA standards. Of course a specific product may work fine in this case, and you could always specify this behavior in a purchase specification (indeed, some customers do put defect management constraints into their specifications). But absent that, the ATA standard as written does not insure that it will work properly. Running out of spares is actually the least of the worries. Suppose you corrupt a lot of sectors, and then read them back (triggering errors)? You could trip all sorts of internal (and external) signals in the drive causing side effects. SMART triggers have been pointed out as one (a READ of a sector that was corrupted with a WRITE LONG MUST be logged as an error, since the READ reported an error - 8.51.6.8.2.4 of ATA-6). Another could be lowering drive performance (i.e. we could try and slow things down in an attempt to reduce the number of "excessive" errors we are seeing). Basically the drive thinks its failing, and so may end up doing a number of otherwise undesirable things in order to "save" the data. This is especially dangerous since a lot of the drive READ/WRITE LONG implementations have probably been static for a long time, and drives acting smarter in data reliability issues is more recent. If you are using READ LONG/WRITE LONG in a controlled testing environment, then this is probably not an issue. But using it for a field feature is dangerous if you just rely on the ATA standard. Jim -----Original Message----- From: Harlan Andrews [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 21, 2002 4:40 PM To: McGrath, Jim; '[EMAIL PROTECTED]'; [EMAIL PROTECTED] Subject: RE: [t13] RAID and R/W LONG >To my knowledge once a drive decides to reallocate, that is a non reversible >decision - you just used up a spare sector on the drive. Do that often >enough and the drive will fail (there are a limited number of spares). Jim, I repeat: Auto-relocation MUST not take place until valid data is available. The non-recovered error should go into the "Pending" list (waiting for a write or a recovered read). Then, when the write occurs, the sector from the "Pending" list should be tested first before re-assignment. WriteLong should NEVER cause re-assignment. When a "Pending" entry becomes available, there is a TEST of that block BEFORE relocation. This prevents the relocation of "good" media. WriteLong should NEVER cause re-assignment. WriteLong does NOT waste spare blocks. ...Harlan ---------------- Begin Forwarded Message ---------------- Date: 3/21/02 3:06 PM Received: 3/21/02 4:05 PM From: McGrath, Jim, [EMAIL PROTECTED] To: '[EMAIL PROTECTED]', [EMAIL PROTECTED] [EMAIL PROTECTED] This message is from the T13 list server. Raymond, You don't understand how auto reallocate works. It has nothing to do with error reporting. When a drive thinks that the media in question is suspect, it "auto reallocates" the data to another portion of media. If the data was readable, then the data is moved at that point. If not, then the drive remembers that the media is suspect and writes the data to the new section of media when it gets the next write command. The drives decision may be correlated to reporting an error to the host, but may not be. As an example, a drive could be performing a background scan of the media during idle time, run into that sector, and at that time determine that the media is suspect. The key is that none of this is standardized. To my knowledge once a drive decides to reallocate, that is a non reversible decision - you just used up a spare sector on the drive. Do that often enough and the drive will fail (there are a limited number of spares). Jim -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 21, 2002 2:12 PM To: [EMAIL PROTECTED] Subject: RE: [t13] RAID and R/W LONG This message is from the T13 list server. Logically, the drive should not auto-reallocate when they encounter a read error, otherwise, the host might read a junk data and get "good status" back. It is not desirable but acceptable to get a read error (that is why people use RAID to prevent that), but it is not acceptable that the drive output the wrong data and tell the host it is good. This is data corruption (instead of data error). Raymond Liu -----Original Message----- From: McGrath, Jim [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 21, 2002 1:40 PM To: '[EMAIL PROTECTED]'; [EMAIL PROTECTED] Subject: RE: [t13] RAID and R/W LONG This message is from the T13 list server. The issue on auto reallocation may be that some implementations would auto reallocate on the subsequent READ of the sector. The drive has no way of knowing that this is a "good" sector that you artificially forced an error into. In general the details of auto reallocation policy are all vendor specific. Jim -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 21, 2002 12:13 PM To: [EMAIL PROTECTED] Subject: RE: [t13] RAID and R/W LONG This message is from the T13 list server. Creat a false uncorrectable error is only done in the very beginning of using the drive as RAID1 rebuild target drive (and only if necessary, i.e. only when the source drive has reported an unrecoverable data block). It might affect the statistical data the drive collected a little bit (only the drive guys can answer this). Auto-relocation should not be affected because this is not a normal write error. Raymond Liu -----Original Message----- From: Hale Landis [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 21, 2002 10:02 AM To: T13 List Server Subject: [t13] RAID and R/W LONG This message is from the T13 list server. On Thu, 21 Mar 2002 09:18:13 -0800, [EMAIL PROTECTED] wrote: >This message is from the T13 list server. >[...] you might implement >vendor specific commands to "address" that >(which will keep the R/W Long >still formally in "obsolete" state)? Raymond, I think I asked a few days ago, but could you explain in detail why/how you are using R/W LONG? Do you expect the command to actually be passed to a drive behind a RAID controller or is the command executed directly and only by the RAID controller? If the command is used to create a false uncorrectable error on a real drive, how do you then adjust for the possible effects on the drive's SMART data or the drives auto-relocation function? *** Hale Landis *** www.ata-atapi.com *** ----------------- End Forwarded Message -----------------
