On Wed, Jun 15, 2022 at 09:09:12AM -0700, Cy Schubert wrote:
| Thanks. This fixes it. 16 simultaneous tar cf /dev/null /usr/obj on three 
| separate NFS clients:
| 
| Server:
|       Clients    OpenOwner        Opens    LockOwner        Locks       
| Delegs
|             3           48           54            0            0           
|  0
|       Layouts
|             0
| 
| The load on the NFS server is ~ 0.5 to 2 on a four core machine with CPU 
| sys busy between 2% and 30%. Prior to this CPU sys was pegged at close to 
| 100% with a load of 18. The regression has been resolved. Thanks.

Thanks for the confirmation.  Seems like statfs is used by a lot of things.

Doug A.
| In message <[email protected]>, Doug Ambrisko writes:
| > On Wed, Jun 15, 2022 at 07:23:51AM -0700, Doug Ambrisko wrote:
| > | On Wed, Jun 15, 2022 at 07:05:14AM -0700, Cy Schubert wrote:
| > | | Can we revert this, please. It breaks NFSv4.
| > | 
| > | It would be nice if you could try the proposed partial revert.
| > | I'm planning to commit that shortly.
| >
| > It is in ce00b11940ab.
| >
| > Please let me know how that works.
| >
| > Thanks,
| >
| > Doug A.
| > | | In message <[email protected]>, Cy Schubert 
writes
| > :
| > | | > In message 
<[email protected]
| > ail.c
| > | | > om>
| > | | > , Mateusz Guzik writes:
| > | | > > On 6/13/22, Doug Ambrisko <[email protected]> wrote:
| > | | > > > On Mon, Jun 13, 2022 at 06:43:31PM +0200, Mateusz Guzik wrote:
| > | | > > > | On 6/13/22, Doug Ambrisko <[email protected]> wrote:
| > | | > > > | > The branch main has been updated by ambrisko:
| > | | > > > | >
| > | | > > > | > URL:
| > | | > > > | >
| > | | > > > 
https://cgit.FreeBSD.org/src/commit/?id=6468cd8e0ef9d1d3331e9de26cd
| > 2be59b
| > | | > c7
| > | | > > 78494
| > | | > > > | >
| > | | > > > | > commit 6468cd8e0ef9d1d3331e9de26cd2be59bc778494
| > | | > > > | > Author:     Doug Ambrisko <[email protected]>
| > | | > > > | > AuthorDate: 2022-06-13 14:56:38 +0000
| > | | > > > | > Commit:     Doug Ambrisko <[email protected]>
| > | | > > > | > CommitDate: 2022-06-13 14:56:38 +0000
| > | | > > > | >
| > | | > > > | >     mount: add vnode usage per file system with mount -v
| > | | > > > | >
| > | | > > > | >     This avoids the need to drop into the ddb to figure out 
vno
| > de
| > | | > > > | >     usage per file system.  It helps to see if they are or 
are 
| > not
| > | | > > > | >     being freed.  Suggestion to report active vnode count was 
f
| > rom
| > | | > > > | >     kib@
| > | | > > > | >
| > | | > > > | >     Reviewed by:    kib
| > | | > > > | >     Differential Revision: https://reviews.freebsd.org/D35436
| > | | > > > | > ---
| > | | > > > | >  sbin/mount/mount.c   |  7 +++++++
| > | | > > > | >  sys/kern/vfs_mount.c | 12 ++++++++++++
| > | | > > > | >  sys/sys/mount.h      |  4 +++-
| > | | > > > | >  3 files changed, 22 insertions(+), 1 deletion(-)
| > | | > > > | >
| > | | > > > | > diff --git a/sbin/mount/mount.c b/sbin/mount/mount.c
| > | | > > > | > index 79d9d6cb0caf..bd3d0073c474 100644
| > | | > > > | > --- a/sbin/mount/mount.c
| > | | > > > | > +++ b/sbin/mount/mount.c
| > | | > > > | > @@ -692,6 +692,13 @@ prmount(struct statfs *sfp)
| > | | > > > | >                       xo_emit("{D:, }{Lw:fsid}{:fsid}", fsidb
| > uf);
| > | | > > > | >                       free(fsidbuf);
| > | | > > > | >               }
| > | | > > > | > +             if (sfp->f_nvnodelistsize != 0 || sfp->f_avnode
| > count !=
| > | | > >  0) {
| > | | > > > | > +                     xo_open_container("vnodes");
| > | | > > > | > +                     xo_emit("{D:,
| > | | > > > | > }{Lwc:vnodes}{Lw:count}{w:count/%ju}{Lw:active}{:active/%ju}",
| > | | > > > | > +                         (uintmax_t)sfp->f_nvnodelistsize,
| > | | > > > | > +                         (uintmax_t)sfp->f_avnodecount);
| > | | > > > | > +                     xo_close_container("vnodes");
| > | | > > > | > +             }
| > | | > > > | >       }
| > | | > > > | >       xo_emit("{D:)}\n");
| > | | > > > | >  }
| > | | > > > | > diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c
| > | | > > > | > index 71a40fd97a9c..e3818b67e841 100644
| > | | > > > | > --- a/sys/kern/vfs_mount.c
| > | | > > > | > +++ b/sys/kern/vfs_mount.c
| > | | > > > | > @@ -2610,6 +2610,8 @@ vfs_copyopt(struct vfsoptlist *opts, 
cons
| > t char
| > | | > > > *name,
| > | | > > > | > void *dest, int len)
| > | | > > > | >  int
| > | | > > > | >  __vfs_statfs(struct mount *mp, struct statfs *sbp)
| > | | > > > | >  {
| > | | > > > | > +     struct vnode *vp;
| > | | > > > | > +     uint32_t count;
| > | | > > > | >
| > | | > > > | >       /*
| > | | > > > | >        * Filesystems only fill in part of the structure for u
| > pdates, 
| > | | > > we
| > | | > > > | > @@ -2624,6 +2626,16 @@ __vfs_statfs(struct mount *mp, struct 
st
| > atfs
| > | | > > > *sbp)
| > | | > > > | >       sbp->f_version = STATFS_VERSION;
| > | | > > > | >       sbp->f_namemax = NAME_MAX;
| > | | > > > | >       sbp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK;
| > | | > > > | > +     sbp->f_nvnodelistsize = mp->mnt_nvnodelistsize;
| > | | > > > | > +
| > | | > > > | > +     count = 0;
| > | | > > > | > +     MNT_ILOCK(mp);
| > | | > > > | > +     TAILQ_FOREACH(vp, &mp->mnt_nvnodelist, v_nmntvnodes) {
| > | | > > > | > +             if (vrefcnt(vp) > 0) /* racy but does not matte
| > r */
| > | | > > > | > +                     count++;
| > | | > > > | > +     }
| > | | > > > | > +     MNT_IUNLOCK(mp);
| > | | > > > | > +     sbp->f_avnodecount = count;
| > | | > > > | >
| > | | > > > |
| > | | > > > | libc uses statfs for dir walk (see gen/fts.c), most notably find
| > | | > > > | immediately runs into it. As such the linear scan by default is 
a
| > | | > > > | non-starter.
| > | | > > > |
| > | | > > > | I don't know if mount is the right place to dump this kind of 
inf
| > o to
| > | | > > > | begin with, but even so, it should only happen with a dedicated 
f
| > lag.
| > | | > > > |
| > | | > > > | As statfs does not take any flags on its own, there is no way to
| > | | > > > | prevent it from doing the above walk. Perhaps a dedicated 
sysctl 
| > which
| > | | > > > | takes mount point id could do the walk instead, when asked.
| > | | > > > |
| > | | > > > | Short of making the walk optional I'm afraid this will have to 
be
| > | | > > > reverted.
| > | | > > >
| > | | > > > Just to be clear, this isn't breaking things but is not optimal 
for
| > | | > > > things that don't need this extra info.
| > | | > > >
| > | | > >
| > | | > > It's not "not optimal", it's a significant overhead which taxes
| > | | > > frequent users which don't benefit from it.
| > | | >
| > | | > Indeed this is not optimal. Since this revision NFSv4 performance has 
| > | | > tanked. An installworld over NFSv4 which used to take approximately 
15 
| > | | > minutes took all night, not even finishing by morning. The NFS 
server, 
| > a 4 
| > | | > core machine, had a load average of 18 with three NFS clients 
attemptin
| > g 
| > | | > installworld with nfsd using over 90% of the cycles on the machine. I 
w
| > as 
| > | | > able to reproduce the problem by running a series of tar cf /dev/null 
| > | | > /usr/obj in parallel, on a single NFSv4 client to essentially DoS the 
N
| > FSv4 
| > | | > server.
| > | | >
| > | | > The workaround was to fall back to NFSv3, which was unaffected by 
this 
| > | | > revision.
| > | | >
| > | | > I reached out to our resident NFS person (rmacklem@) who suggested 
| > | | > reverting this revision, restoring the network to pre-regression 
state.
| > | | >
| > | | > >
| > | | > > For more data I plugged dtrace -n 'fbt::__vfs_statfs:entry {
| > | | > > @[execname] = count(); }' while package building, then i got tons of
| > | | > > hits:
| > | | > > [snip]
| > | | > >   expr                                                          
13992
| > | | > >   install                                                       
14090
| > | | > >   dirname                                                       
14921
| > | | > >   mv                                                            
17404
| > | | > >   ghc-stage1                                                    
17577
| > | | > >   grep                                                          
18998
| > | | > >   xgcc                                                          
23832
| > | | > >   cpp                                                           
29282
| > | | > >   cc1                                                           
36961
| > | | > >   sh                                                            
70575
| > | | > >   rm                                                            
73904
| > | | > >   ld.lld                                                        
87784
| > | | > >   sed                                                           
88803
| > | | > >   c++                                                           
98175
| > | | > >   cat                                                          
115811
| > | | > >   cc                                                           
449725
| > | | > >
| > | | > [...]
| > | | > > -- 
| > | | > > Mateusz Guzik <mjguzik gmail.com>
| > | | > >
| > | | >
| > | | >
| > | | > -- 
| > | | > Cheers,
| > | | > Cy Schubert <[email protected]>
| > | | > FreeBSD UNIX:  <[email protected]>   Web:  http://www.FreeBSD.org
| > | | > NTP:           <[email protected]>    Web:  https://nwtime.org
| > | | >
| > | | >                       e**(i*pi)+1=0
| > | | >
| > | | >
| > | | 
| 

Reply via email to