The Anarcat wrote:
> > When you killed the power on your system and reset it, you
> > lost the cached data sitting in the ATA disk.  This is due
> > to the fact that the ATA disk lied, and claimed that it had
> > committed some writes to stable storage, when in fact it had
> > only copied them to the disk cache.  As a result, when the
> > device reset happened, you lost some writes which were in
> > progress.  Therefore you disk image was corrupt, and so your
> > FS was *not* in a self-consistent state.
> 
> Shouldn't fsck run in the foreground for disks setup with WC? That
> would be a quick hack solving this issue altogether.

There are a lot of "quick hacks" that can be done to solve the
issue.  There are also real fixes:

o       Disable BG fsck if WC is on; I dislike this hack,
        mostly because of postings by drive engineers to
        FreeBSD lists, indicating a willingness to address
        ATA issues like this, and the fact that most SCSI
        drives don't actually have this issue.

o       Put a counter in the first superblock; it would be
        incremented when the BG fsck is started, and reset
        to zero when it completes.  If the counter reaches
        3 (or some command line specified number), then the
        BG flagging is ignored, and a full FG fsck is then
        performed instead.  I like this idea because it will
        always work, and it's not actually a hack, it's a
        correct solution.

o       Implement "soft read-only".  The place that most of
        the complaints are coming from is desktop users, with
        relatively quiescent machines.  Though swap is used,
        it does not occur in an FS partition.  As a result,
        the FS could be marked "read-only" for long period of
        time.  This marking would be in memory.  The clean bit
        would be set on the superblock.  When a write occurs,
        the clean bit would be reset to "dirty", and committed
        to disk prior to the write operation being permitted
        to proceed (a stall barrier).  I like this idea because,
        for the most part, it eliminates fsck, both BG and FG,
        on systems that crash while it's in effect.  The net
        result is a system that is statistically much more
        tolerant of failures, but which still requires another
        safety net, such as the previous solution.

o       Disk manufacturers could fix the ATA write caching
        problem.  I think this will happen eventually, so the
        first "solution" is out.

o       PC manufacturers could provide OS-usable NVRAM scratch
        areas, which would permit an OS to allocate a section,
        and use it.  The OS would then write the FreeBSD marker
        into an area to allocate it, and then write "power fail"
        as the failure code into the allocated area.  When a
        panic or hardware failure occurred, it could write "panic"
        or "hardware fail" as the failure code.  When the system
        came back up, it would be able to distinguish which type
        of failure by reading the NVRAM area.  If it was something
        like "panic with sync", it could run the BG fsck, otherwise
        it would run the FG fsck.  I really like this idea, too.  I
        believe that more modern systems have this capability, but
        it has not yet been standardized.  Therefore we should take
        a "wait and see" attitude towards it.

o       Disk manufacturers could provide a Lithium battery on board
        disks.  This would not only bound their "planned obsolesence"
        curve to 5 years or so (lifetime of the battery), it would
        give them an aftermarket.  The battery would trickle-charge
        from the disk drive power, and would be used to commit the
        write cache in event of power failure.  I like this too; it
        makes disk drives obsolete at about 2X the distance that they
        become obsolete, it gives the drive manufacturers a bone for
        playing along, and it actually solves the problem at it's
        source.  People might not like "your disk lasts 5 years" vs.
        "your warranty is one year", but smoothing the market demand
        function is probably worth more, in terms of lower cost to
        consumers and assured profit to disk manufacturers, and it
        can be billed as a marketing checkbox item, to force all the
        other disk manufacturers into implementing it, too, so there
        should be no downside.

o       We can change our file system structure to "journalled"; I like
        this as well, but there are some issues with manufacturers who
        do not provide track bondary information, so you can assure
        yourselves that a track boundary doesn't span a corruption
        boundary, in the event of a power failure.  If you can do this,
        journalling actually becomes incredibly fast, since you know
        the disk writes backwards on a given track, so you can just
        implemente the "completed write" datestamp, and perform a single
        write, instead of two writes, in order to get a track on the
        disk.

There are other approaches that I'm not prepared to share in a forum
where they might be made public, but you get the idea.  Several of the
above are implementable now, particularly the counter and the soft
read-only, with a day or less of effort.

-- Terry
_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to