On 12/02/18 22:14, J. Bruce Fields wrote:
On Mon, Feb 12, 2018 at 08:12:58PM +0000, Terry Barnaby wrote:
On 12/02/18 17:35, Terry Barnaby wrote:
On 12/02/18 17:15, J. Bruce Fields wrote:
On Mon, Feb 12, 2018 at 05:09:32PM +0000, Terry Barnaby wrote:
One thing on this, that I forgot to ask, doesn't fsync() work
properly with
an NFS server side async mount then ?
No.

If a server sets "async" on an export, there is absolutely no way for a
client to guarantee that data reaches disk, or to know when it happens.

Possibly "ignore_sync", or "unsafe_sync", or something else, would be a
better name.
...
Just tried the use of fsync() with an NFS async mount, it appears to work.
That's expected, it's the *export* option that cheats, not the mount
option.

Also, even if you're using the async export option--fsync will still
flush data to server memory, just not necessarily to disk.

With a simple 'C' program as a test program I see the following data
rates/times when the program writes 100 MBytes to a single file over NFS
(open, write, write .., fsync) followed by close (after the timing):

NFS Write multiple small files 0.001584 ms/per file 0.615829 MBytes/sec
CpuUsage: 3.2%
Disktest: Writing/Reading 100.00 MBytes in 1048576 Byte Chunks
Disk Write sequential data rate fsync: 1 107.250685 MBytes/sec CpuUsage:
13.4%
Disk Write sequential data rate fsync: 0 4758.953878 MBytes/sec CpuUsage:
66.7%

Without the fsync() call the data rate is obviously to buffers and with the
fsync() call it definitely looks like it is to disk.
Could be, or you could be network-limited, hard to tell without knowing
more.

Interestingly, it appears, that the close() call actually does an effective
fsync() as well as the close() takes an age when fsync() is not used.
Yes: http://nfs.sourceforge.net/#faq_a8

--b.

Quite right, it was network limited (disk vs network speed is about the same). Using a slower USB stick disk shows that fsync() is not working with a NFSv4 "async" export.

But why is this ? It just doesn't make sense to me that fsync() should work this way even with an NFS "async" export ? Why shouldn't it do the right thing "synchronize a file's in-core state with storage device" (I don't consider an NFS server a storage device only the non volatile devices it uses). It seems it would be easy to flush the clients write buffer to the NFS server (as it does now) and then perform the fsync() on the server for the file in question. What am I missing ?


Thinking out loud (and without a great deal of thought), on removing the NFS export "async" option, improving write small files performance and keeping data security it seems to me one method might be:

1. NFS server is always in "async" export mode (Client can mount in sync mode if wanted). Data and metadata (optionally) is buffered in RAM on client and server.

2. Client fsync() works all the way to disk on the server.

3. Client sync() does an fsync() of each open for write NFS file. (Maybe this will be too much load on NFS servers ...)

4. You implement NFSv4 write delegations :)

5. There is a transaction based system for file writes:

5.1 When a file is opened for write, a transaction is created (id). This is sent with the OPEN call.

5.2 Further file operations including SETATTR, WRITE are allocated as stages in this transaction (id.stage) and are just buffered in the client (no direct server RPC calls).

5.3 The client sends the NFS operations for this write, as and when, optimised into full sized network packets to the server. But the data and metadata are kept buffered in the client.

5.4 The server stores the data in its normal FS RAM buffers during the NFS RPC calls.

5.5 When the server actually writes the data to disk (using its normal optimised disk writing system for the file system and device in question), the transaction and stage (id.stage) are returned to the client (within an NFS reply). The client can now release the buffers up to this stage in the transaction.

The transaction system allows the write delegation to send the data to the servers RAM without the overhead of synchronous writes to the disk.

It does mean the data is stored in RAM in both the client and server at the same time (twice as much RAM usage). Not sure how easy it would be to implement in the Linux kernel (NFS informed on FS buffer free ?) and would require NFS protocol extensions for the transactions.

With this method the client can resend the data on a server fail/reboot and the data can be ensured to be on the disk after an fsync(), sync() (within reason!). It should offer the fastest write performance and should eliminate the untar performance issue with small file creation/writes and still be relatively secure with data if the server dies. Unless I am missing something ?


PS: I have some RPC latency figures for some other NFS servers at work. The NFS RPC latency on some of them is nearer the ICMP ping times, ie about 100us. Maybe quite a bit of CPU is needed to respond to an NFS RPC call these days. The 500us RPC time was on a oldish home server using an Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz.

Terry
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org

Reply via email to