On Mon, Jun 22, 2015 at 01:19:59PM +0200, Richard Weinberger wrote:
> 
> > The bottome lins is that if you care about files being written, you
> > need to use fsync().  Should git use fsync() by default?  Well, if you
> > are willing to accept that if your system crashes within a second or
> > so of your last git operation, you might need to run "git fsck" and
> > potentially recover from a busted repo, maybe speed is more important
> > for you (and git is known for its speed/performance, after all. :-)

I made a typo in the above.  s/second/minute/.  (Linux's writeback
timer is 30 seconds, but if the disk is busy it might take a bit
longer to get all of the data blocks written out to disk and
committed.)

> I think core.fsyncObjectFiles documentation really needs an update.
> What about this one?
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 43bb53c..b08fa11 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -693,10 +693,16 @@ core.whitespace::
>  core.fsyncObjectFiles::
>       This boolean will enable 'fsync()' when writing object files.
>  +
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +For performance reasons git does not call 'fsync()' after writing object
> +files. This means that after a power cut your git repository can get
> +corrupted as not all data hit the storage media. Especially on modern
> +filesystems like ext4, xfs or btrfs this can happen very easily.
> +If you have to face power cuts and care about your data it is strongly
> +recommended to enable this setting.
> +Please note that git's behavior used to be safe on ext3 with data=ordered,
> +for any other filesystems or mount settings this is not the case as
> +POSIX clearly states that you have to call 'fsync()' to make sure that
> +all data is written.


My main complaint about this is that it's a bit Linux-centric.  For
example, the fact that fsync(2) is needed to push data out of the
cache is also true for MacOS (and indeed all other Unix systems going
back three decades) as well as Windows.  In fact, it's not a matter of
"POSIX says", but "POSIX documented", but since standards are held in
high esteem, it's sometimes a bit more convenient to use them as an
appeal to authority.  :-)

(Ext3's data=ordered behaviour is an outlier, and in fact, the reason
why it mostly safe to skip fsync(2) calls when using ext3 data=ordered
was an accidental side effect of another problem which was trying to
solve based on the relatively primitive way it handled block
allocation.)

Cheers,

                                                - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in

Reply via email to