On 12/02/18 17:35, Terry Barnaby wrote:
On 12/02/18 17:15, J. Bruce Fields wrote:
On Mon, Feb 12, 2018 at 05:09:32PM +0000, Terry Barnaby wrote:
One thing on this, that I forgot to ask, doesn't fsync() work properly with
an NFS server side async mount then ?

If a server sets "async" on an export, there is absolutely no way for a
client to guarantee that data reaches disk, or to know when it happens.

Possibly "ignore_sync", or "unsafe_sync", or something else, would be a
better name.


Well that seems like a major drop off, I always thought that fsync() would work in this case. I don't understand why fsync() should not operate as intended ? Sounds like this NFS async thing needs some work !

I still do not understand why NFS doesn't operate in the same way as a standard mount on this. The use for async is only for improved performance due to disk write latency and speed (or are there other reasons ?)

So with a local system mount:

async: normal mode: All system calls manipulate in buffer memory disk structure (inodes etc). Data/Metadata is flushed to disk on fsync(), sync() and occasionally by kernel. Processes data is not actually stored until fsync(), sync() etc.

sync: with sync option. Data/metadata is written to disk before system calls return (all FS system calls ?).

With an NFS mount I would have thought it should be the same.

async: normal mode: All system calls manipulate in buffer memory disk structure (inodes etc) this would normally be on the server (so multiple clients can work with the same data) but with some options (particular usage) maybe client side write buffering/caching could be used (ie. data would not actually pass to server during every FS system call). Data/Metadata is flushed to server disk on fsync(), sync() and occasionally by kernel (If client side write caching is used flushes across network and then flushes server buffers). Processes data is not actually stored until fsync(), sync() etc.

sync: with client side sync option. Data/metadata is written across NFS and to Server disk before system calls return (all FS system calls ?).

I really don't understand why the async option is implemented on the server export although a sync option here could force sync for all clients for that mount. What am I missing ? Is there some good reason (rather than history) it is done this way ?

Just tried the use of fsync() with an NFS async mount, it appears to work. With a simple 'C' program as a test program I see the following data rates/times when the program writes 100 MBytes to a single file over NFS (open, write, write .., fsync) followed by close (after the timing):

NFS Write multiple small files 0.001584 ms/per file 0.615829 MBytes/sec CpuUsage: 3.2%
Disktest: Writing/Reading 100.00 MBytes in 1048576 Byte Chunks
Disk Write sequential data rate fsync: 1 107.250685 MBytes/sec CpuUsage: 13.4% Disk Write sequential data rate fsync: 0 4758.953878 MBytes/sec CpuUsage: 66.7%

Without the fsync() call the data rate is obviously to buffers and with the fsync() call it definitely looks like it is to disk.

Interestingly, it appears, that the close() call actually does an effective fsync() as well as the close() takes an age when fsync() is not used.

(By the way just go bitten by a Fedora27 KDE/plasma/NetworkManager change that sets the Ethernet interfaces of all my systems to 100 MBits/s half duplex. Looks like the ability to configure Ethernet auto negotiation has been added and the default is fixed 100 MBits/s half duplex !)

Basic test code (just the write function):

void nfsPerfWrite(int doFsync){
    int        f;
    char        buf[bufSize];
    int        n;
    double        st, et, r;
    int        nb;
    int        numBuf;
    CpuStat        cpuStatStart;
    CpuStat        cpuStatEnd;
    double        cpuUsed;
    double        cpuUsage;

    f = open64(fileName, O_RDWR | O_CREAT, 0666);
    if(f < 0){
        fprintf(stderr, "Error creating %s: %s\n", fileName, strerror(errno));

    st = getTime();
    for(n = 0; n < diskNum; n++){
        if((nb = write(f, buf, bufSize)) != bufSize)
            fprintf(stderr, "WriteError: %d\n", nb);


    et = getTime();

    cpuStatEnd.user = cpuStatEnd.user - cpuStatStart.user;
    cpuStatEnd.nice = cpuStatEnd.nice - cpuStatStart.nice;
    cpuStatEnd.sys = cpuStatEnd.sys - cpuStatStart.sys;
    cpuStatEnd.idle = cpuStatEnd.idle - cpuStatStart.idle;
    cpuStatEnd.wait = cpuStatEnd.wait - cpuStatStart.wait;
    cpuStatEnd.hi = cpuStatEnd.hi - cpuStatStart.hi;
    cpuStatEnd.si = cpuStatEnd.si - cpuStatStart.si;

    cpuUsed = (cpuStatEnd.user + cpuStatEnd.nice + cpuStatEnd.sys + cpuStatEnd.hi + cpuStatEnd.si);
    cpuUsage = cpuUsed / (cpuUsed + cpuStatEnd.idle);

    r = (double(diskNum) * bufSize) / (et - st);
    printf("Disk Write sequential data rate fsync: %d %f MBytes/sec CpuUsage: %.1f%\n", doFsync, r / (1024*1024), cpuUsage * 100);
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org

Reply via email to