On 12/02/18 22:14, J. Bruce Fields wrote:
On Mon, Feb 12, 2018 at 08:12:58PM +0000, Terry Barnaby wrote:
On 12/02/18 17:35, Terry Barnaby wrote:
On 12/02/18 17:15, J. Bruce Fields wrote:
On Mon, Feb 12, 2018 at 05:09:32PM +0000, Terry Barnaby wrote:
One thing on this, that I forgot to ask, doesn't fsync() work
an NFS server side async mount then ?
If a server sets "async" on an export, there is absolutely no way for a
client to guarantee that data reaches disk, or to know when it happens.
Possibly "ignore_sync", or "unsafe_sync", or something else, would be a
Just tried the use of fsync() with an NFS async mount, it appears to work.
That's expected, it's the *export* option that cheats, not the mount
Also, even if you're using the async export option--fsync will still
flush data to server memory, just not necessarily to disk.
With a simple 'C' program as a test program I see the following data
rates/times when the program writes 100 MBytes to a single file over NFS
(open, write, write .., fsync) followed by close (after the timing):
NFS Write multiple small files 0.001584 ms/per file 0.615829 MBytes/sec
Disktest: Writing/Reading 100.00 MBytes in 1048576 Byte Chunks
Disk Write sequential data rate fsync: 1 107.250685 MBytes/sec CpuUsage:
Disk Write sequential data rate fsync: 0 4758.953878 MBytes/sec CpuUsage:
Without the fsync() call the data rate is obviously to buffers and with the
fsync() call it definitely looks like it is to disk.
Could be, or you could be network-limited, hard to tell without knowing
Interestingly, it appears, that the close() call actually does an effective
fsync() as well as the close() takes an age when fsync() is not used.
Quite right, it was network limited (disk vs network speed is about the
same). Using a slower USB stick disk shows that fsync() is not working
with a NFSv4 "async" export.
But why is this ? It just doesn't make sense to me that fsync() should
work this way even with an NFS "async" export ? Why shouldn't it do the
right thing "synchronize a file's in-core state with storage device" (I
don't consider an NFS server a storage device only the non volatile
devices it uses). It seems it would be easy to flush the clients write
buffer to the NFS server (as it does now) and then perform the fsync()
on the server for the file in question. What am I missing ?
Thinking out loud (and without a great deal of thought), on removing the
NFS export "async" option, improving write small files performance and
keeping data security it seems to me one method might be:
1. NFS server is always in "async" export mode (Client can mount in sync
mode if wanted). Data and metadata (optionally) is buffered in RAM on
client and server.
2. Client fsync() works all the way to disk on the server.
3. Client sync() does an fsync() of each open for write NFS file. (Maybe
this will be too much load on NFS servers ...)
4. You implement NFSv4 write delegations :)
5. There is a transaction based system for file writes:
5.1 When a file is opened for write, a transaction is created (id). This
is sent with the OPEN call.
5.2 Further file operations including SETATTR, WRITE are allocated as
stages in this transaction (id.stage) and are just buffered in the
client (no direct server RPC calls).
5.3 The client sends the NFS operations for this write, as and when,
optimised into full sized network packets to the server. But the data
and metadata are kept buffered in the client.
5.4 The server stores the data in its normal FS RAM buffers during the
NFS RPC calls.
5.5 When the server actually writes the data to disk (using its normal
optimised disk writing system for the file system and device in
question), the transaction and stage (id.stage) are returned to the
client (within an NFS reply). The client can now release the buffers up
to this stage in the transaction.
The transaction system allows the write delegation to send the data to
the servers RAM without the overhead of synchronous writes to the disk.
It does mean the data is stored in RAM in both the client and server at
the same time (twice as much RAM usage). Not sure how easy it would be
to implement in the Linux kernel (NFS informed on FS buffer free ?) and
would require NFS protocol extensions for the transactions.
With this method the client can resend the data on a server fail/reboot
and the data can be ensured to be on the disk after an fsync(), sync()
(within reason!). It should offer the fastest write performance and
should eliminate the untar performance issue with small file
creation/writes and still be relatively secure with data if the server
dies. Unless I am missing something ?
PS: I have some RPC latency figures for some other NFS servers at work.
The NFS RPC latency on some of them is nearer the ICMP ping times, ie
about 100us. Maybe quite a bit of CPU is needed to respond to an NFS RPC
call these days. The 500us RPC time was on a oldish home server using an
Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz.
devel mailing list -- email@example.com
To unsubscribe send an email to devel-le...@lists.fedoraproject.org