Re: Fedora27: NFS v4 terrible write performance, is async working

Terry Barnaby Mon, 12 Feb 2018 23:02:13 -0800

On 12/02/18 22:14, J. Bruce Fields wrote:

On Mon, Feb 12, 2018 at 08:12:58PM +0000, Terry Barnaby wrote:

On 12/02/18 17:35, Terry Barnaby wrote:

On 12/02/18 17:15, J. Bruce Fields wrote:

On Mon, Feb 12, 2018 at 05:09:32PM +0000, Terry Barnaby wrote:

One thing on this, that I forgot to ask, doesn't fsync() work
properly with
an NFS server side async mount then ?

No.


If a server sets "async" on an export, there is absolutely no way for a
client to guarantee that data reaches disk, or to know when it happens.

Possibly "ignore_sync", or "unsafe_sync", or something else, would be a
better name.

...

Just tried the use of fsync() with an NFS async mount, it appears to work.

That's expected, it's the *export* option that cheats, not the mount
option.

Also, even if you're using the async export option--fsync will still
flush data to server memory, just not necessarily to disk.

With a simple 'C' program as a test program I see the following data
rates/times when the program writes 100 MBytes to a single file over NFS
(open, write, write .., fsync) followed by close (after the timing):

NFS Write multiple small files 0.001584 ms/per file 0.615829 MBytes/sec
CpuUsage: 3.2%
Disktest: Writing/Reading 100.00 MBytes in 1048576 Byte Chunks
Disk Write sequential data rate fsync: 1 107.250685 MBytes/sec CpuUsage:
13.4%
Disk Write sequential data rate fsync: 0 4758.953878 MBytes/sec CpuUsage:
66.7%

Without the fsync() call the data rate is obviously to buffers and with the
fsync() call it definitely looks like it is to disk.

Could be, or you could be network-limited, hard to tell without knowing
more.

Interestingly, it appears, that the close() call actually does an effective
fsync() as well as the close() takes an age when fsync() is not used.

Yes: http://nfs.sourceforge.net/#faq_a8

--b.

Quite right, it was network limited (disk vs network speed is about thesame). Using a slower USB stick disk shows that fsync() is not workingwith a NFSv4 "async" export.

But why is this ? It just doesn't make sense to me that fsync() shouldwork this way even with an NFS "async" export ? Why shouldn't it do theright thing "synchronize a file's in-core state with storage device" (Idon't consider an NFS server a storage device only the non volatiledevices it uses). It seems it would be easy to flush the clients writebuffer to the NFS server (as it does now) and then perform the fsync()on the server for the file in question. What am I missing ?

Thinking out loud (and without a great deal of thought), on removing theNFS export "async" option, improving write small files performance andkeeping data security it seems to me one method might be:

1. NFS server is always in "async" export mode (Client can mount in syncmode if wanted). Data and metadata (optionally) is buffered in RAM onclient and server.


2. Client fsync() works all the way to disk on the server.

3. Client sync() does an fsync() of each open for write NFS file. (Maybethis will be too much load on NFS servers ...)


4. You implement NFSv4 write delegations :)

5. There is a transaction based system for file writes:

5.1 When a file is opened for write, a transaction is created (id). Thisis sent with the OPEN call.

5.2 Further file operations including SETATTR, WRITE are allocated asstages in this transaction (id.stage) and are just buffered in theclient (no direct server RPC calls).

5.3 The client sends the NFS operations for this write, as and when,optimised into full sized network packets to the server. But the dataand metadata are kept buffered in the client.

5.4 The server stores the data in its normal FS RAM buffers during theNFS RPC calls.

5.5 When the server actually writes the data to disk (using its normaloptimised disk writing system for the file system and device inquestion), the transaction and stage (id.stage) are returned to theclient (within an NFS reply). The client can now release the buffers upto this stage in the transaction.

The transaction system allows the write delegation to send the data tothe servers RAM without the overhead of synchronous writes to the disk.

It does mean the data is stored in RAM in both the client and server atthe same time (twice as much RAM usage). Not sure how easy it would beto implement in the Linux kernel (NFS informed on FS buffer free ?) andwould require NFS protocol extensions for the transactions.

With this method the client can resend the data on a server fail/rebootand the data can be ensured to be on the disk after an fsync(), sync()(within reason!). It should offer the fastest write performance andshould eliminate the untar performance issue with small filecreation/writes and still be relatively secure with data if the serverdies. Unless I am missing something ?

PS: I have some RPC latency figures for some other NFS servers at work.The NFS RPC latency on some of them is nearer the ICMP ping times, ieabout 100us. Maybe quite a bit of CPU is needed to respond to an NFS RPCcall these days. The 500us RPC time was on a oldish home server using anIntel(R) Core(TM)2 CPU 6300 @ 1.86GHz.


Terry
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org

Re: Fedora27: NFS v4 terrible write performance, is async working

Reply via email to