Am 2023-09-04 14:26, schrieb Mateusz Guzik:
On 9/4/23, Alexander Leidinger <alexan...@leidinger.net> wrote:Am 2023-08-28 22:33, schrieb Alexander Leidinger:Am 2023-08-22 18:59, schrieb Mateusz Guzik:On 8/22/23, Alexander Leidinger <alexan...@leidinger.net> wrote:Am 2023-08-21 10:53, schrieb Konstantin Belousov:On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger <alexan...@leidinger.net> wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger <alexan...@leidinger.net> > > >> wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger <alexan...@leidinger.net> > > >>>> wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you > > >>>>> interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has > > >>> several > > >>> null mounts. One basesystem mounted into every jail, and then > > >>> shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order > > >>>> to > > >>>> do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes > > >>>> from > > >>>> exclusive locking which should not be there to begin with -- > > >>>> as > > >>>> in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of > > >> xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how > > this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set > for > fuse and nfs at least.11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them.By nfs I meant nfs client, not nfs exports.No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails.While I don't see why xlocking happens, you should be able to dtrace or printf your way into finding out.dtrace looks to me like a faster approach to get to the root thanprintf... my first naive try is to detect exclusive locks. I'm not 100%sure I got it right, but at least dtrace doesn't complain about it: ---snip--- #pragma D option dynvarsize=32m fbt:nullfs:null_lock:entry /args[0]->a_flags & 0x080000 != 0/ { stack(); } ---snip---In which direction should I look with dtrace if this works in tonightsrun of periodic? I don't have enough knowledge about VFS to come up with some immediate ideas.After your sysctl fix for maxvnodes I increased the amount of vnodes 10times compared to the initial report. This has increased the speed ofthe operation, the find runs in all those jails finished today after ~5h(@~8am) instead of in the afternoon as before. Could this suggest that in parallel some null_reclaim() is running which does the exclusive locks and slows down the entire operation?That may be a slowdown to some extent, but the primary problem is exclusive vnode locking for stat lookup, which should not be happening.
With -current as of 2023-09-03 (and right now 2023-09-11), the periodic daily runs are down to less than an hour... and this didn't happen directly after switching to 2023-09-13. First it went down to 4h, then down to 1h without any update of the OS. The only thing what I did was modifying the number of maxfiles. First to some huge amount after your commit in the sysctl affecting part. Then after noticing way more freevnodes than configured down to 500000000.
Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 0x8F31830F9F2772BF
signature.asc
Description: OpenPGP digital signature