Re: [t13] e04155r0 - DRQ=0 When ERR=1 Feature

Pat LaVarre Mon, 15 Nov 2004 16:50:41 -0800

This message is from the T13 list server.

I imagine if you feel I am provoking you to ! then I am misunderstanding ...

I specifically saw the Amen!, but think I'm missing something key. I see nothing else except the reiteration of the principle, with which I entirely agree, theoretically.

Meanwhile, back in the real world, in some systems, I do see so-called error cases occurring commonly. When they do occur commonly, then do they factor significantly into thruput.

How can a fact like that be controversial?

Shipping any theoretical design assumption by definition is a way of filtering for the cases that actually violate that assumption?

A recent & concrete example is:

Subject: [usb-storage] Re: [Linux-usb-users] Data phase error not solved in 2.6.9-final http://lists.one-eyed-alien.net/pipermail/usb-storage/2004-October/ 001104.html

Windows was tolerating a non-repeating hard x 4 4B "data phase" error - that other operating system was not. We blew a certain amount of time discovering that the error did also occur in Windows. Windows just didn't care.

Pat LaVarre

P.S. Specifically the conversation I saw seemingly not progress here was:

In reading the proposal, it seems that there is a issue. Enough so the write a proposal. If the state does occur as the proposal describes, then the bad data is read to clear the DRQ. A performance issue and as you comment, issue a soft reset and re-issue the command. How long does a soft reset take (performance hit).
When you are getting into error states, you may assume that performance has _already_ gone out the window.

When writing error handling code, focus on _correctness_ not performance IMO.
In principle I agree ... except for when correctness and performance become available together, I also remember anecdotes like USB DVD/CD burners creating coasters because the host handling of unexpectedly short data was slow, and HDD appearing slower in one operating system than another, because one of the operating systems covered up non-repeating errors more rapidly ... Does your personal history of pain include no such episodes?
Pardon the use of exclamation points, I normally abstain from the practice, but: we are talking about error handling here!

We don't need every ATA driver stack "optimizing" its error handling. We need to encourage software authors to write software that handles errors in the same, boring manner as everyone else.
We're not talking about data transfer here.
Data transfer is the hot path, the part you optimize.
Error handling is the 0.1% case that you _must_ get right. Performance considerations do nothing but increase complexity. You get the performance that correctness provides, and nothing more :)

Re: [t13] e04155r0 - DRQ=0 When ERR=1 Feature

Reply via email to