On Wed, Jan 17 2018, Junio C. Hamano jotted:
> Christoph Hellwig <[email protected]> writes:
>
>> fsync is required for data integrity as there is no gurantee that
>> data makes it to disk at any specified time without it. Even for
>> ext3 with data=ordered mode the file system will only commit all
>> data at some point in time that is not guaranteed.
>
> It comes from this one:
>
> commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef
> Author: Linus Torvalds <[email protected]>
> Date: Wed Jun 18 15:18:44 2008 -0700
>
> Add config option to enable 'fsync()' of object files
>
> As explained in the documentation[*] this is totally useless on
> filesystems that do ordered/journalled data writes, but it can be a
> useful safety feature on filesystems like HFS+ that only journal the
> metadata, not the actual file contents.
>
> It defaults to off, although we could presumably in theory some day
> auto-enable it on a per-filesystem basis.
>
> [*] Yes, I updated the docs for the thing. Hell really _has_ frozen
> over, and the four horsemen are probably just beyond the horizon.
> EVERYBODY PANIC!
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 0e25b2c92..9a1cec5c8 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -866,10 +866,8 @@ core.whitespace::
>> core.fsyncObjectFiles::
>> This boolean will enable 'fsync()' when writing object files.
>> +
>> -This is a total waste of time and effort on a filesystem that orders
>> -data writes properly, but can be useful for filesystems that do not use
>> -journalling (traditional UNIX filesystems) or that only journal metadata
>> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
>> +This option is enabled by default and ensures actual data integrity
>> +by calling fsync after writing object files.
>
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here. But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?
[CC'd the author of the current behavior]
Some points/questions:
a) Is there some reliable way to test whether this is needed from
userspace? I'm thinking something like `git update-index
--test-untracked-cache` but for fsync().
b) On the filesystems that don't need this, what's the performance
impact?
I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
on the tests I thought might do a lot of loose object writes:
$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux
GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~
fsync-on p3400-rebase.sh p0007-write-cache.sh
[...]
Test fsync-on~
fsync-on
-------------------------------------------------------------------------------------------------------
3400.2: rebase on top of a lot of unrelated changes
1.45(1.30+0.17) 1.45(1.28+0.20) +0.0%
3400.4: rebase a lot of unrelated changes without split-index
4.34(3.71+0.66) 4.33(3.69+0.66) -0.2%
3400.6: rebase a lot of unrelated changes with split-index
3.38(2.94+0.47) 3.38(2.93+0.47) +0.0%
0007.2: write_locked_index 3 times (3214 files)
0.01(0.00+0.00) 0.01(0.00+0.00) +0.0%
No impact. However I did my own test of running the test suite 10%
times with/without this patch, and it runs 9% slower:
fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63
21.95
fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83
23.88
Test script at the end of this E-Mail.
c) What sort of guarantees in this regard do NFS-mounted filesystems
commonly make?
Test script:
use v5.10.0;
use strict;
use warnings;
use Time::HiRes qw(time);
use List::Util qw(sum);
use Data::Dumper;
my %time;
for my $ref (@ARGV) {
system "git checkout $ref";
system qq[make -j56 CFLAGS="-O3 -g" NO_OPENSSL=Y all];
for (1..10) {
my $t0 = -time();
system "(cd t && NO_SVN_TESTS=1 GIT_TEST_HTTPD=0 prove -j56
--state=slow,save t[0-9]*.sh)";
$t0 += time();
push @{$time{$ref}} => $t0;
}
}
for my $ref (sort keys %time) {
printf "%20s: avg:%.2f %s\n",
$ref,
sum(@{$time{$ref}})/@{$time{$ref}},
join(" ", map { sprintf "%.02f", $_ } sort { $a <=> $b }
@{$time{$ref}});
}