On Tue, Nov 06, 2012 at 01:47:02PM +0000, Michael Kjörling wrote:
> On 6 Nov 2012 12:48 +0000, from h...@carfax.org.uk (Hugo Mills):
> >    There are also some caveats: while the FS should always be
> > consistent, the latest transaction write may not have been completed,
> > so you could potentially lose up to 30 seconds of writes to the FS
> > from immediately before the crash.
> 
> I'd rather lose the most recent 30 seconds of writes but have a
> consistent file system with as-consistent-as-can-be-expected data,
> than end up with a corrupted file system.
> 
> On that note; can this value be tuned currently, is it hardcoded, or
> is it stored in metadata somewhere but the tooling to tune it is not
> yet available?

   As far as I understand, no, it's hard-coded.

> >    If the FS does corrupt over a power failure, and the hardware can
> > be demonstrated to be good, then we have a bug that needs to be
> > tracked down. (There have been a number of these over the development
> > of the FS so far, but they do get fixed).
> 
> Is there a simple way to tell ahead of time whether the hardware meets
> the assumptions made by the file system with regards to write barriers
> etc.?

   "Most" hardware does. I think there's a "barriers disabled" warning
in the kernel logs on mounting the FS, and some time ago there were
rumours of a tool to check for it (from Red Hat, but I don't know if
it ever saw the light of day). That's all for the case where the
hardware explicitly states that it doesn't support barriers.

   More concerning is the out-of-spec hardware which claims to support
barriers and utterly fails to do so. I don't think there's much you
can do to detect that case, other than force failures and try to catch
it out -- then return it to the manufacturer under whatever consumer
protection laws you have, on the grounds that it's no fit for purpose.

   I think the number of actual such hard disks that do this is fairly
small, but they are out there. I'm not aware of a blacklist/quirks
list for them.

> >    I guess the question for you is: are you after the _expected_
> > behaviour of the FS (should always be consistent on good hardware, but
> > you may lose up to 30 seconds of writes), or are you after mitigation
> > strategies in the face of FS bugs (keep off-site backups and be
> > prepared to use them)?
> 
> I already have full, daily on-site backups on an external drive that
> is logically unmounted except for when backups are running, as well as
> partial off-site backups to cloud storage - and of course, taking
> advantage of btrfs's snapshotting support there is no real reason why
> I couldn't increase the backup frequency while retaining data
> consistency. Losing half a minute of writes is fairly inconsequential
> for personal use as long as the file system remains consistent, and in
> the face of disastrous corruption it is at least possible to do a full
> restore to bare metal from rescue media and backup without losing too
> much. Not trivial time-wise (that's currently 1.4 TB over USB 2.0),
> but possible.

   OK, so I hope I've managed to answer your question satisfactorily.
Let us know if there's any outstanding queries you want cleared up. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
    --- "I will not be pushed,  filed, stamped, indexed, briefed, ---    
               debriefed or numbered.  My life is my own."               

Attachment: signature.asc
Description: Digital signature

Reply via email to