On Sun, 2016-06-05 at 21:07 +0000, Hugo Mills wrote: > The problem is that you can't guarantee consistency with > nodatacow+checksums. If you have nodatacow, then data is overwritten, > in place. If you do that, then you can't have a fully consistent > checksum -- there are always race conditions between the checksum and > the data being written (or the data and the checksum, depending on > which way round you do it).
I'm not an expert in the btrfs internals... but I had a pretty long discussion back then when I brought this up first, and everything that came out of that - to my understanding - indicated, that it should be simply possible. a) nodatacow just means "no data cow", but not "no meta data cow". And isn't the checksumming data meda data? So AFAIU, this is itself anyway COWed. b) What you refer to above is, AFAIU, that data may be written (not COWed) and there is of course no guarantee that the written data matches the checksum (which may e.g. still be the old sum). => So what? This anyway only happens in case of crash/etc. and in that case we anyway have no idea, whether the written not COWed block is consistent or not, whether we do checksumming or not. We rather get the benefit that we now know: it may be garbage The only "bad" thing that could happen was: the block is fully written and actually consistent, but the checksum hasn't been written yet - IMHO much less likely than the other case(s). And I rather get one false positive in an more unlikely case, than corrupted blocks in all other possible situations (silent block errors, etc. pp.) And in principle, nothing would prevent a future btrfs to get a journal for the nodatacow-ed writes. Look for the past thread "dear developers, can we have notdatacow + checksumming, plz?",... I think I wrote about much more cases there, any why - even it may not be perfect as datacow+checksumming - it would always still be better to have checksumming with nodatacow. > > Wasn't it said, that autodefrag performs bad for anything larger > > than > > ~1G? > > I don't recall ever seeing someone saying that. Of course, I may > have forgotten seeing it... I think it was mentioned below this thread: http://thread.gmane.org/gmane.comp.file-systems.btrfs/50444/focus=50586 and also implied here: http://article.gmane.org/gmane.comp.file-systems.btrfs/51399/match=autodefrag+large+files > > Well the fragmentation has also many other consequences and not > > just > > seeks (assuming everyone would use SSDs, which is and probably > > won't be > > the case for quite a while). > > Most obviously you get much more IOPS and btrfs itself will, AFAIU, > > also suffer from some issues due to the fragmentation. > This is a fundamental problem with all CoW filesystems. There are > some mititgations that can be put in place (true CoW rather than > btrfs's redirect-on-write, like some databases do, where the original > data is copied elsewhere before overwriting; cache aggressively and > with knowledge of the CoW nature of the FS, like ZFS does), but they > all have their drawbacks and pathological cases. Sure... but defrag (if it would generally work) or notdatacow (if it wouldn't make you loose the ability to determine whether you're consistent or not) would be already quite helpful here. Cheers, Chris.
smime.p7s
Description: S/MIME cryptographic signature