This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/couchdb.git


The following commit(s) were added to refs/heads/main by this push:
     new 5ed0654ab Use fdatasync for commits
5ed0654ab is described below

commit 5ed0654abd69748d769043ed677ca46c655db84c
Author: Nick Vatamaniuc <[email protected]>
AuthorDate: Mon Jan 13 15:59:25 2025 -0500

    Use fdatasync for commits
    
    We can use fdatasync to save 1 extra write per call, for a total of 2 writes
    per commit, since we do two sync, one for data block up to the header, then
    another after the header.
    
    As of OTP 25 (our oldest supported version):
      * On Linux/BSDs: fdatasync()
      * On Window: FlushFileBuffers() i.e. the same as for file:sync/1
      * On MacOS: fcntl(fd,F_FULLFSYNC/F_BARRIERFSYNC)
    
    According to https://linux.die.net/man/2/fdatasync
    
     > fdatasync() is similar to fsync(), but does not flush modified metadata
     unless that metadata is needed in order to allow a subsequent data 
retrieval
     to be correctly handled. For example, changes to st_atime or
     st_mtime (respectively, time of last access and time of last modification; 
see
     stat(2)) do not require flushing because they are not necessary for a
     subsequent data read to be handled correctly. On the other hand, a change 
to
     the file size (st_size, as made by say ftruncate(2)), would require a 
metadata
     flush.
    
    The key things for us are:
    
      * It updates the size (positions) correctly
      * We do not rely or care about atime/mtime for safety or correctness
      * Erlang VM does the right thing on all the supported OSes
---
 src/couch/src/couch_file.erl | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/couch/src/couch_file.erl b/src/couch/src/couch_file.erl
index 8616e039b..8c7370688 100644
--- a/src/couch/src/couch_file.erl
+++ b/src/couch/src/couch_file.erl
@@ -599,7 +599,12 @@ format_status(_Opt, [PDict, #file{} = File]) ->
 
 fsync(Fd) ->
     T0 = erlang:monotonic_time(),
-    Res = file:sync(Fd),
+    % We do not rely on mtime/atime for our safety/consitency so we can use
+    % fdatasync. As of version 25 OTP will use:
+    %  - On Linux/BSDs: fdatasync()
+    %  - On Window: FlushFileBuffers() i.e. the same as for file:sync/1
+    %  - On MacOS: fcntl(fd,F_FULLFSYNC/F_BARRIERFSYNC)
+    Res = file:datasync(Fd),
     T1 = erlang:monotonic_time(),
     % Since histograms can consume floating point values we can measure in
     % nanoseconds, then turn it into floating point milliseconds

Reply via email to