At 2004-08-01T14:52:56+1200, Volker Kuhlmann wrote:
> Not if the disk makes some of the info available through the smart
> interface, as it seems to do.

I've already posted one reason why it may not be doing this.  Also, from
a quick scan of the SMART specs, it appears those fields are very vendor
specific, so you can't really make claims about the meaning of the
fields without referring to the SMART documentation for each vendor.

> Read another 250 times. If one of them succeeds you have your data.

...and if it doesn't, it doesn't have the data and cannot perform a
sector remapping. 

> Chances of detecting a write failure aren't that high - the
> magnetisation of the track would have to be seriously damaged, such
> that head tracking fails. If you just write out a stream of bits over a

Why would head tracking fail when a single sector is damaged?

> magnetic surface, you have no way of knowing if magnetistion (and
> therefore write) occured correctly until after you have read it back in
> again. Verify of all written blocks won't happen. Yes I have thought
> about it :) It's very similar to the old 5 1/4" floppies. You can
> write as much as you want, doesn't mean the data is correctly recorded.
> Does anyone notice? Nope, that's why floppies were copied with a verify
> run after writing. Takes twice as long.

The drive can detect a number of cases of write failure, and in fact
this is when most sector remappings are performed.  Today's IDE drives
are not your father's 5 1/4" floppies, and suffice to say they do not
work as simply as you describe them.

> But it's plenty enough to do the block read test you suggested in that
> paragraph. :)

Which still doesn't tell you which file is affected.  Go back and read
what you wrote.

> Well I thought that would be simple as: If 10 block transfers with DMA
> have gone through fine, it's pretty well not the DMA which is suddenly
> going to fix a media problem, is it? And if a bus reset is necessary,

I wonder why the driver is turning DMA off, then.  I guess it must be
stupid, or something.  Or, in fact, it's been done for a reason to aid
the error recovery process.

And your heuristic is totally flawed, anyway.  I've got a machine here
with a buggy APIC that causes interrupts to behave badly.  I can do
heavy I/O for a couple of minutes (i.e. hundreds of block transfers a
second) without problems, and after a short while the IDE controller
starts losing interrupts.  When the kernel drops back to PIO mode, you
can (slowly) do I/O again.  This problem can happen at any time after
boot--the process of bringing the machine up to runlevel 3 doesn't cause
enough I/O for the problem to occur.

> fine - but please restore to previous state. I know how to deal with
> DMA, Joe User doesn't. It's a usability issue.

DMA is re-enabled in some error recovery paths.  As I said before, it's
not a really usability issue, it's a hardware issue.  If DMA is off,
it's very likely off for a good reason.  If your example user complains
about performance (assuming they notice and become curious about it
instead of just rebooting their computer), people will point that user
to the kernel logs, they'll find an error and discover they have a
hardware (or media) problem, and go from there...  this chain of actions
leaves the end user _better_ off than if their machines appears to
continue running fine until the day their drive is dead.

> Sure. And the Linux kernel doesn't have and never had design and
> programming shortcomings, no.... Come off it. There are plenty of

Sure, the kernel has had design problems and programming errors.  That
doesn't mean that this behaviour is wrong, though.  Are you seriously
suggesting that you know better than all of the contributors to the IDE
core in the Linux kernel?  If so, go ahead and fix it--send
patches--save the world... everyone will thank you for it!

> I keep on hearing that the SCSI implementation is pretty bad,
> certainly wrt to error handling. And when an unreadable cd block
> deadlocks the kernel, then either IDE error recovery is somewhat
> shaky, or the drive is flawed. I'm not discounting either possibility
> (and the drive was not the cheapest).

IDE in general isn't the best design.  Most optical drives are awful
quality and only support older ATA standards.  These three things add up
to some very strange and annoying behaviour.

It's pretty seldom that you'll find a case of an unreadable CD causing
the kernel to deadlock (maybe you just mean "appear to hang"), and it's
unlikely to be the kernel's fault.  Perhaps you've seen it because
you're attaching hard disks and optical drives to the same IDE
channel--this configuration in combination with an unreadable CD will
make it very difficult to do I/O to the hard disk--this is the fault of
IDE, not the kernel.

> Perhaps this thread has reached EOL now...

Yes, unless you feel like reading t13.org before posting any further,
because this dualing speculation is veering further into the totally
pointless. ;-)

Cheers,
-mjg
-- 
Matthew Gregan                     |/
                                  /|                [EMAIL PROTECTED]

Reply via email to