Re: Not-so stable if you take a CAM error....

Karl Denninger Mon, 11 Jul 2016 10:32:22 -0700

On 7/11/2016 11:32, Ian Lepore wrote:
> On Mon, 2016-07-11 at 09:50 -0400, Brandon Allbery wrote:
>> On Mon, Jul 11, 2016 at 9:46 AM, Karl Denninger <[email protected]>
>> wrote:
>>
>>> Here's the backtrace ... sounds like expected behavior, which is
>>> not-so
>>> good all-in for a situation like this.  I guess the strategy is to
>>> turn
>>> off softupdates before attempting such an update so as not to crash
>>> the
>>> host machine if there's a problem with the card.
>>>
>> I would tend to assume that removable media should not have
>> softupdates
>> enabled. Even with properly working media, it's practically begging
>> for
>> corruption.
>>
> Writing to an sdcard without softupdates enabled will be an exercise in
> patience.  Like, come back next week and maybe it'll be done.
>
> The only thing that comes to mind with this is maybe some sort of mount
> flag to say you're willing to live with any amount of filesystem
> corruption in lieu of panicking.  I'm not sure how easy/practical that
> would be to implement, though.
>
> -- Ian
Why not force-detach the volume that takes the error instead of a panic()?


That would lead to a panic if the detached volume was the system volume
(obviously) but for a data volume it would simply result in it being
forcibly unmounted (and dirty, so if it's corrupt it will get caught
when reattached.)

It seems that the current paradigm of saying "screw you, panic the
machine" violates the principle of least astonishment and is overly
punitive vis-a-vis necessity.  Refusing further I/O because the volume
may now have a corrupt filesystem appears to be facially reasonable, but
that doesn't necessarily wind up being fatal the system itself -- it is
if that's the system volume and is not covered by some sort of
redundancy, obviously, but it's not in all cases.

(Note that you can't just unmount the filesystem involved in the error;
it has to be the volume that gets forcibly detached and whatever flows
through from that you have to live with.  The reason is that on any sort
of solid-state media the OS has zero control over zoning and write
amplification means far more the data you were actually modifying may
have been lost -- it's entirely possible that *several megabytes* of
data just got trashed by the write error, and it's even possible that
the block(s) involved cross a filesystem boundary!)

-- 
Karl Denninger
[email protected] <mailto:[email protected]>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Not-so stable if you take a CAM error....

Reply via email to