Re: Speed improvements in ZFS

Alexander Leidinger Fri, 15 Sep 2023 03:11:04 -0700

Am 2023-09-04 14:26, schrieb Mateusz Guzik:

On 9/4/23, Alexander Leidinger <alexan...@leidinger.net> wrote:

Am 2023-08-28 22:33, schrieb Alexander Leidinger:

Am 2023-08-22 18:59, schrieb Mateusz Guzik:

On 8/22/23, Alexander Leidinger <alexan...@leidinger.net> wrote:

Am 2023-08-21 10:53, schrieb Konstantin Belousov:

On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger <alexan...@leidinger.net> wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger <alexan...@leidinger.net>
> > >> wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger <alexan...@leidinger.net>
> > >>>> wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
> > >>>>> interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has
> > >>> several
> > >>> null mounts. One basesystem mounted into every jail, and then
> > >>> shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order
> > >>>> to
> > >>>> do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes
> > >>>> from
> > >>>> exclusive locking which should not be there to begin with --
> > >>>> as
> > >>>> in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >>         if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >>                 mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >>                     (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >>                     MNTK_EXTENDED_SHARED);
> > >>         }
> > >>
> > >> are you using the "nocache" option? it has a side effect of
> > >> xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how
> > this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
> for
> fuse and nfs at least.


11 of those 122 nullfs mounts are ZFS datasets which are also NFS
exported.
6 of those nullfs mounts are also exported via Samba. The NFS
exports
shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this
exclusive
lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I
need
space for. Both pools reside on the same disks. The root pool is a
3-way
mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
space-pool. The jails are all basejail-style jails.


While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.


dtrace looks to me like a faster approach to get to the root than

printf... my first naive try is to detect exclusive locks. I'm not 100%

sure I got it right, but at least dtrace doesn't complain about it:
---snip---
#pragma D option dynvarsize=32m

fbt:nullfs:null_lock:entry
/args[0]->a_flags & 0x080000 != 0/
{
        stack();
}
---snip---

In which direction should I look with dtrace if this works in tonights

run of periodic? I don't have enough knowledge about VFS to come up
with some immediate ideas.

After your sysctl fix for maxvnodes I increased the amount of vnodes 10

times compared to the initial report. This has increased the speed of

the operation, the find runs in all those jails finished today after ~5h

(@~8am) instead of in the afternoon as before. Could this suggest that
in parallel some null_reclaim() is running which does the exclusive
locks and slows down the entire operation?


That may be a slowdown to some extent, but the primary problem is
exclusive vnode locking for stat lookup, which should not be
happening.

With -current as of 2023-09-03 (and right now 2023-09-11), the periodic daily runs are down to less than an hour... and this didn't happen directly after switching to 2023-09-13. First it went down to 4h, then down to 1h without any update of the OS. The only thing what I did was modifying the number of maxfiles. First to some huge amount after your commit in the sysctl affecting part. Then after noticing way more freevnodes than configured down to 500000000.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netch...@freebsd.org  : PGP 0x8F31830F9F2772BF

signature.asc
Description: OpenPGP digital signature

Re: Speed improvements in ZFS

Reply via email to