On Tue, 2020-09-22 at 13:31 +0100, Daire Byrne wrote:
> Hi, 
> 
> I just thought I'd flesh out the other two issues I have found with 
> re-exporting that are ultimately responsible for the biggest performance 
> bottlenecks. And both of them revolve around the caching of metadata file 
> lookups in the NFS client.
> 
> Especially for the case where we are re-exporting a server many milliseconds 
> away (i.e. on-premise -> cloud), we want to be able to control how much the 
> client caches metadata and file data so that it's many LAN clients all 
> benefit from the re-export server only having to do the WAN lookups once 
> (within a specified coherency time).
> 
> Keeping the file data in the vfs page cache or on disk using 
> fscache/cachefiles is fairly straightforward, but keeping the metadata cached 
> is particularly difficult. And without the cached metadata we introduce long 
> delays before we can serve the already present and locally cached file data 
> to many waiting clients.
> 
> ----- On 7 Sep, 2020, at 18:31, Daire Byrne [email protected] wrote:
> > 2) If we cache metadata on the re-export server using actimeo=3600,nocto we 
> > can
> > cut the network packets back to the origin server to zero for repeated 
> > lookups.
> > However, if a client of the re-export server walks paths and memory maps 
> > those
> > files (i.e. loading an application), the re-export server starts issuing
> > unexpected calls back to the origin server again, ignoring/invalidating the
> > re-export server's NFS client cache. We worked around this this by patching 
> > an
> > inode/iversion validity check in inode.c so that the NFS client cache on the
> > re-export server is used. I'm not sure about the correctness of this patch 
> > but
> > it works for our corner case.
> 
> If we use actimeo=3600,nocto (say) to mount a remote software volume on the 
> re-export server, we can successfully cache the loading of applications and 
> walking of paths directly on the re-export server such that after a couple of 
> runs, there are practically zero packets back to the originating NFS server 
> (great!). But, if we then do the same thing on a client which is mounting 
> that re-export server, the re-export server now starts issuing lots of calls 
> back to the originating server and invalidating it's client cache (bad!).
> 
> I'm not exactly sure why, but the iversion of the inode gets changed locally 
> (due to atime modification?) most likely via invocation of method 
> inode_inc_iversion_raw. Each time it gets incremented the following call to 
> validate attributes detects changes causing it to be reloaded from the 
> originating server.
> 

I'd expect the change attribute to track what's in actual inode on the
"home" server. The NFS client is supposed to (mostly) keep the raw
change attribute in its i_version field.

The only place we call inode_inc_iversion_raw is in
nfs_inode_add_request, which I don't think you'd be hitting unless you
were writing to the file while holding a write delegation.

What sort of server is hosting the actual data in your setup?


> This patch helps to avoid this when applied to the re-export server but there 
> may be other places where this happens too. I accept that this patch is 
> probably not the right/general way to do this, but it helps to highlight the 
> issue when re-exporting and it works well for our use case:
> 
> --- linux-5.5.0-1.el7.x86_64/fs/nfs/inode.c     2020-01-27 00:23:03.000000000 
> +0000
> +++ new/fs/nfs/inode.c  2020-02-13 16:32:09.013055074 +0000
> @@ -1869,7 +1869,7 @@
>  
>         /* More cache consistency checks */
>         if (fattr->valid & NFS_ATTR_FATTR_CHANGE) {
> -               if (!inode_eq_iversion_raw(inode, fattr->change_attr)) {
> +               if (inode_peek_iversion_raw(inode) < fattr->change_attr) {
>                         /* Could it be a race with writeback? */
>                         if (!(have_writers || have_delegation)) {
>                                 invalid |= NFS_INO_INVALID_DATA
> 
> With this patch, the re-export server's NFS client attribute cache is 
> maintained and used by all the clients that then mount it. When many hundreds 
> of clients are all doing similar things at the same time, the re-export 
> server's NFS client cache is invaluable in accelerating the lookups 
> (getattrs).
> 
> Perhaps a more correct approach would be to detect when it is knfsd that is 
> accessing the client mount and change the cache consistency checks 
> accordingly? 

Yeah, I don't think you can do this for the reasons Trond outlined.
-- 
Jeff Layton <[email protected]>

--
Linux-cachefs mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cachefs

Reply via email to