The Anarcat wrote:
> > When you killed the power on your system and reset it, you
> > lost the cached data sitting in the ATA disk. This is due
> > to the fact that the ATA disk lied, and claimed that it had
> > committed some writes to stable storage, when in fact it had
> > only copied them to the disk cache. As a result, when the
> > device reset happened, you lost some writes which were in
> > progress. Therefore you disk image was corrupt, and so your
> > FS was *not* in a self-consistent state.
>
> Shouldn't fsck run in the foreground for disks setup with WC? That
> would be a quick hack solving this issue altogether.
There are a lot of "quick hacks" that can be done to solve the
issue. There are also real fixes:
o Disable BG fsck if WC is on; I dislike this hack,
mostly because of postings by drive engineers to
FreeBSD lists, indicating a willingness to address
ATA issues like this, and the fact that most SCSI
drives don't actually have this issue.
o Put a counter in the first superblock; it would be
incremented when the BG fsck is started, and reset
to zero when it completes. If the counter reaches
3 (or some command line specified number), then the
BG flagging is ignored, and a full FG fsck is then
performed instead. I like this idea because it will
always work, and it's not actually a hack, it's a
correct solution.
o Implement "soft read-only". The place that most of
the complaints are coming from is desktop users, with
relatively quiescent machines. Though swap is used,
it does not occur in an FS partition. As a result,
the FS could be marked "read-only" for long period of
time. This marking would be in memory. The clean bit
would be set on the superblock. When a write occurs,
the clean bit would be reset to "dirty", and committed
to disk prior to the write operation being permitted
to proceed (a stall barrier). I like this idea because,
for the most part, it eliminates fsck, both BG and FG,
on systems that crash while it's in effect. The net
result is a system that is statistically much more
tolerant of failures, but which still requires another
safety net, such as the previous solution.
o Disk manufacturers could fix the ATA write caching
problem. I think this will happen eventually, so the
first "solution" is out.
o PC manufacturers could provide OS-usable NVRAM scratch
areas, which would permit an OS to allocate a section,
and use it. The OS would then write the FreeBSD marker
into an area to allocate it, and then write "power fail"
as the failure code into the allocated area. When a
panic or hardware failure occurred, it could write "panic"
or "hardware fail" as the failure code. When the system
came back up, it would be able to distinguish which type
of failure by reading the NVRAM area. If it was something
like "panic with sync", it could run the BG fsck, otherwise
it would run the FG fsck. I really like this idea, too. I
believe that more modern systems have this capability, but
it has not yet been standardized. Therefore we should take
a "wait and see" attitude towards it.
o Disk manufacturers could provide a Lithium battery on board
disks. This would not only bound their "planned obsolesence"
curve to 5 years or so (lifetime of the battery), it would
give them an aftermarket. The battery would trickle-charge
from the disk drive power, and would be used to commit the
write cache in event of power failure. I like this too; it
makes disk drives obsolete at about 2X the distance that they
become obsolete, it gives the drive manufacturers a bone for
playing along, and it actually solves the problem at it's
source. People might not like "your disk lasts 5 years" vs.
"your warranty is one year", but smoothing the market demand
function is probably worth more, in terms of lower cost to
consumers and assured profit to disk manufacturers, and it
can be billed as a marketing checkbox item, to force all the
other disk manufacturers into implementing it, too, so there
should be no downside.
o We can change our file system structure to "journalled"; I like
this as well, but there are some issues with manufacturers who
do not provide track bondary information, so you can assure
yourselves that a track boundary doesn't span a corruption
boundary, in the event of a power failure. If you can do this,
journalling actually becomes incredibly fast, since you know
the disk writes backwards on a given track, so you can just
implemente the "completed write" datestamp, and perform a single
write, instead of two writes, in order to get a track on the
disk.
There are other approaches that I'm not prepared to share in a forum
where they might be made public, but you get the idea. Several of the
above are implementable now, particularly the counter and the soft
read-only, with a day or less of effort.
-- Terry
_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"