Re: btrfs

Christoph Anton Mitterer Sun, 05 Jun 2016 14:33:12 -0700

On Sun, 2016-06-05 at 21:07 +0000, Hugo Mills wrote:
>    The problem is that you can't guarantee consistency with
> nodatacow+checksums. If you have nodatacow, then data is overwritten,
> in place. If you do that, then you can't have a fully consistent
> checksum -- there are always race conditions between the checksum and
> the data being written (or the data and the checksum, depending on
> which way round you do it).


I'm not an expert in the btrfs internals... but I had a pretty long
discussion back then when I brought this up first, and everything that
came out of that - to my understanding - indicated, that it should be
simply possible.

a) nodatacow just means "no data cow", but not "no meta data cow".
   And isn't the checksumming data meda data? So AFAIU, this is itself
   anyway COWed.
b) What you refer to above is, AFAIU, that data may be written (not
   COWed) and there is of course no guarantee that the written data
   matches the checksum (which may e.g. still be the old sum).
   => So what?
      This anyway only happens in case of crash/etc. and in that case
      we anyway have no idea, whether the written not COWed block is
      consistent or not, whether we do checksumming or not.
      We rather get the benefit that we now know: it may be garbage
      The only "bad" thing that could happen was:
      the block is fully written and actually consistent, but the
      checksum hasn't been written yet - IMHO much less likely than
      the other case(s). And I rather get one false positive in an
      more unlikely case, than corrupted blocks in all other possible
      situations (silent block errors, etc. pp.)
      And in principle, nothing would prevent a future btrfs to get a
      journal for the nodatacow-ed writes.

Look for the past thread "dear developers, can we have notdatacow +
checksumming, plz?",... I think I wrote about much more cases there,
any why - even it may not be perfect as datacow+checksumming - it would
always still be better to have checksumming with nodatacow.

> > Wasn't it said, that autodefrag performs bad for anything larger
> > than
> > ~1G?
> 
>    I don't recall ever seeing someone saying that. Of course, I may
> have forgotten seeing it...
I think it was mentioned below this thread:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/50444/focus=50586
and also implied here:
http://article.gmane.org/gmane.comp.file-systems.btrfs/51399/match=autodefrag+large+files


> > Well the fragmentation has also many other consequences and not
> > just
> > seeks (assuming everyone would use SSDs, which is and probably
> > won't be
> > the case for quite a while).
> > Most obviously you get much more IOPS and btrfs itself will, AFAIU,
> > also suffer from some issues due to the fragmentation.
>    This is a fundamental problem with all CoW filesystems. There are
> some mititgations that can be put in place (true CoW rather than
> btrfs's redirect-on-write, like some databases do, where the original
> data is copied elsewhere before overwriting; cache aggressively and
> with knowledge of the CoW nature of the FS, like ZFS does), but they
> all have their drawbacks and pathological cases.
Sure... but defrag (if it would generally work) or notdatacow (if it
wouldn't make you loose the ability to determine whether you're
consistent or not) would be already quite helpful here.


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: btrfs

Reply via email to