Re: The FAQ on fsync/O_SYNC

Martin Steigerwald Sun, 19 Apr 2015 10:51:00 -0700

Am Sonntag, 19. April 2015, 15:18:51 schrieb Hugo Mills:
> On Sun, Apr 19, 2015 at 05:10:30PM +0200, Martin Steigerwald wrote:
> > Am Sonntag, 19. April 2015, 22:31:02 schrieb Craig Ringer:
> > > On 19 April 2015 at 22:28, Martin Steigerwald <mar...@lichtvoll.de>
> > 
> > wrote:
> > > > Am Sonntag, 19. April 2015, 21:20:11 schrieb Craig Ringer:
> > > >> Hi all
> > > > 
> > > > Hi Craig,
> > > > 
> > > >> I'm looking into the advisability of running PostgreSQL on BTRFS,
> > > >> and
> > > >> after looking at the FAQ there's something I'm hoping you could
> > > >> clarify.
> > > >> 
> > > >> The wiki FAQ says:
> > > >> 
> > > >> "Btrfs does not force all dirty data to disk on every fsync or
> > > >> O_SYNC
> > > >> operation, fsync is designed to be fast."
> > > >> 
> > > >> Is that wording intended narrowly, to contrast with ext3's nasty
> > > >> habit
> > > >> of flushing *all* dirty blocks for the entire file system
> > > >> whenever
> > > >> anyone calls fsync() ? Or is it intended broadly, to say that
> > > >> btrfs's
> > > >> fsync won't necessarily flush all data blocks (just metadata) ?
> > > >> 
> > > >> Is that statement still true in recent BTRFS versions (3.18,
> > > >> etc)?
> > > > 
> > > > I don´t know, thus leave that for others to answer. I always
> > > > assumed a
> > > > strong fsync() guarentee as in "its on disk" with BTRFS. So I am
> > > > interested in that as well.
> > > > 
> > > > But for databases, did you consider the copy on write
> > > > fragmentation
> > > > BTRFS will give? Even with autodefrag, afaik it is not recommended
> > > > to
> > > > use it for large databases on rotating media at least.
> > > 
> > > I did, and any testing would need to look at the efficacy of the
> > > chattr +C option on the database directory tree.
> > > 
> > > PostgreSQL is its self copy-on-write (because of multi-version
> > > concurrency control), so it doesn't make much sense to have the FS
> > > doing another layer of COW.
> > > 
> > > I'm curious as to whether +C has any effect on BTRFS's durability,
> > > too.
> > 
> > You will loose the ability to snapshot that directory tree then.
> 
>    No you won't.
> 
>    The +C attribute still allows snapshotting and reflink copies.
> However, after the snapshot, writes to either copy will result in that
> copy being CoWed. (Specifically, writes to an extent of a +C file with
> more than one reference to the extent will result in a CoW operation,
> until there is only one reference, and then the writes will not be
> CoWed again).
> 
>    The practical upshot of this is that every snapshot of, and
> subsequent writes to, a +C file will introduce fragmentation in the
> same way that writes to a non-+C file would.
> 
>    You also have a disadvantage with +C that you lose the checksumming
> features of the FS, and hence the self-healing properties if you're
> running with btrfs-native RAID.


Thanks for clarifying this Hugo, so chattr +C will make the directory 
cowed again.

And there is not checksumming on the FS at all anymore. Why is the later? 
Why can´t BTRFS checkum nocowed objects or at least the cowed ones in the 
same FS? Cause of atomicity guarentees?

If this has been answered before, and I missed it, feel free to point me 
to it, I didn´t find anything obvious with my quick search.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

signature.asc
Description: This is a digitally signed message part.

Re: The FAQ on fsync/O_SYNC

Reply via email to