On Tue, Jan 01, 2013 at 02:39:44PM +0100, Dominic Fandrey wrote:
> On 01/01/2013 07:51, Konstantin Belousov wrote:
> > On Tue, Jan 01, 2013 at 02:05:11AM +0100, Dominic Fandrey wrote:
> >> On 01/01/2013 01:49, Dominic Fandrey wrote:
> >>> On 01/01/2013 01:29, Chris Rees wrote:
> >>>> On 1 Jan 2013 00:01, "Dominic Fandrey" <[email protected]> wrote:
> >>>>>
> >>>>> I have a Tinderbox that I just updated to the current RELENG_9.
> >>>>> Following the update build times for packages have increased by a
> >>>>> factor between 5 and 20. I.e. I have packages that used to build in
> >>>>> 5 minutes and now take an hour.
> >>>>>
> >>>>> I'm suspecting the file system ever since I saw that the majority of CPU
> >>>>> load was caused by ls when I looked at top (more than 2 minutes of CPU
> >>>>> time were counted that moment). The majority of the time most of the CPU
> >>>>> load is caused by bsdtar, pkg_add, qmake-qt4, etc. Without exception
> >>>>> tools that access a lot of files.
> >>>>>
> >>>>> The file system on which packages are built is nullfs mounted from
> >>>>> an async mounted UFS. I turned async off, to no avail.
> >>>>>
> >>>>> /usr/src/UPDATING says that there were nullfs optimisations. So I
> >>>>> think this is where the problem originates. I might hack the tinderbox 
> >>>>> to
> >>>>> use 'ln -s' or set it up for NFS to verify this.
> >>>>
> >>>> Is your kernel newer than the Jail?  The converse causes problems.
> >>>
> >>> I ran makeJail for all jails after updating.
Did you rebuild your modules together with the new kernel ?

> >>>
> >>> I also seem to have similar problems when building in the host-system.
> >>> The unzip for openjdk-7 has just passed the 11 minutes CPU time mark.
> >>> On my notebook it takes less than 10 seconds.
> >>
> >> Just set WRKOBJDIRPREFIX to a tmpfs on the Tinderbox host system
> >> and the extract takes less than a second. Originally WRKOBJDIRPREFIX
> >> also pointed to a nullfs mount.
> >>
> >> Afterwards I pointed WRKOBJDIRPREFIX to a UFS file system (without
> >> nullfs involvement). The entire make extract took 20s.
> >>
> >> So still faster by at least factor 30 than running it on a nullfs mount
> >> (I eventually SIGINTed so I don't know how long it would've run).
> > 
> > Start providing some useful debugging information ?
> 
> That one might be interesting. It's all system time:
> 
> # time -lh make extract
> ===>  License GPLv2 accepted by the user
> ===>  Found saved configuration for openjdk-7.9.05_1
> ===>  Extracting for openjdk-7.9.05_2
> => SHA256 Checksum OK for openjdk-7u6-fcs-src-b24-09_aug_2012.zip.
> => SHA256 Checksum OK for apache-ant-1.8.4-bin.zip.
> ===>   openjdk-7.9.05_2 depends on file: /usr/local/bin/unzip - found
> ^Ctime: command terminated abnormally
>         4m29.30s real           3.03s user              4m22.55s sys
>       5008  maximum resident set size
>        135  average shared memory size
>       2932  average unshared data size
>        127  average unshared stack size
>       7772  page reclaims
>          0  page faults
>          0  swaps
>         19  block input operations
>        101  block output operations
>          0  messages sent
>          0  messages received
>         41  signals received
>       1597  voluntary context switches
>      16590  involuntary context switches

Ok, from your mount -v output, are the three nullfs mounts the only
nullfs mount ever used ?

Is it only unzip which demostrates the silly behaviour ? Or does it
happen with any program ? E.g., does ls(1) or sha1 on the nullfs mount
also slow ?

Could you try some low-tech profiling on the slow program. For instance,
you could run ktrace/kdump -R to see which syscalls are slow.

Most darkly part of your report for me, is that I also use nullfs-backed
jails both on HEAD and stable/9, with bigger scale, and I do not have
an issue. I just did
pooma32% time unzip -q 
/usr/local/arch/freebsd/distfiles/openjdk-7u6-fcs-src-b24-09_aug_2012.zip
unzip -q   3.25s user 23.77s system 78% cpu 34.482 total
over nullfs mount of
/usr/home on /usr/sfw/local8/opt/pooma32/usr/home (nullfs, local).

Please try the following patch, which changes nullfs behaviour to be
non-cached by default. You could turn on the caching with the 'mount -t
nullfs -o cache from to' mounting command. I am interested if use/non-use
of -o cache makes a difference for you.

diff --git a/sbin/mount_nullfs/mount_nullfs.c b/sbin/mount_nullfs/mount_nullfs.c
index c88db3d..aaf66e5 100644
--- a/sbin/mount_nullfs/mount_nullfs.c
+++ b/sbin/mount_nullfs/mount_nullfs.c
@@ -57,27 +57,35 @@ static const char rcsid[] =
 
 #include "mntopts.h"
 
-static struct mntopt mopts[] = {
-       MOPT_STDOPTS,
-       MOPT_END
-};
-
 int    subdir(const char *, const char *);
 static void    usage(void) __dead2;
 
 int
 main(int argc, char *argv[])
 {
-       struct iovec iov[6];
-       int ch, mntflags;
+       struct iovec *iov;
+       char *p, *val;
        char source[MAXPATHLEN];
        char target[MAXPATHLEN];
+       char errmsg[255];
+       int ch, mntflags, iovlen;
+       char nullfs[] = "nullfs";
 
+       iov = NULL;
+       iovlen = 0;
        mntflags = 0;
+       errmsg[0] = '\0';
        while ((ch = getopt(argc, argv, "o:")) != -1)
                switch(ch) {
                case 'o':
-                       getmntopts(optarg, mopts, &mntflags, 0);
+                       val = strdup("");
+                       p = strchr(optarg, '=');
+                       if (p != NULL) {
+                               free(val);
+                               *p = '\0';
+                               val = p + 1;
+                       }
+                       build_iovec(&iov, &iovlen, optarg, val, (size_t)-1);
                        break;
                case '?':
                default:
@@ -99,21 +107,16 @@ main(int argc, char *argv[])
                errx(EX_USAGE, "%s (%s) and %s are not distinct paths",
                    argv[0], target, argv[1]);
 
-       iov[0].iov_base = strdup("fstype");
-       iov[0].iov_len = sizeof("fstype");
-       iov[1].iov_base = strdup("nullfs");
-       iov[1].iov_len = strlen(iov[1].iov_base) + 1;
-       iov[2].iov_base = strdup("fspath");
-       iov[2].iov_len = sizeof("fspath");
-       iov[3].iov_base = source;
-       iov[3].iov_len = strlen(source) + 1;
-       iov[4].iov_base = strdup("target");
-       iov[4].iov_len = sizeof("target");
-       iov[5].iov_base = target;
-       iov[5].iov_len = strlen(target) + 1;
-
-       if (nmount(iov, 6, mntflags))
-               err(1, NULL);
+       build_iovec(&iov, &iovlen, "fstype", nullfs, (size_t)-1);
+       build_iovec(&iov, &iovlen, "fspath", source, (size_t)-1);
+       build_iovec(&iov, &iovlen, "target", target, (size_t)-1);
+       build_iovec(&iov, &iovlen, "errmsg", errmsg, sizeof(errmsg));
+       if (nmount(iov, iovlen, mntflags) < 0) {
+               if (errmsg[0] != 0)
+                       err(1, "%s: %s", source, errmsg);
+               else
+                       err(1, "%s", source);
+       }
        exit(0);
 }
 
diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h
index 0878e55..4f37020 100644
--- a/sys/fs/nullfs/null.h
+++ b/sys/fs/nullfs/null.h
@@ -34,9 +34,15 @@
  * $FreeBSD$
  */
 
+#ifndef        FS_NULL_H
+#define        FS_NULL_H
+
+#define        NULLM_CACHE     0x0001
+
 struct null_mount {
        struct mount    *nullm_vfs;
        struct vnode    *nullm_rootvp;  /* Reference to root null_node */
+       uint64_t        nullm_flags;
 };
 
 #ifdef _KERNEL
@@ -80,3 +86,5 @@ MALLOC_DECLARE(M_NULLFSNODE);
 #endif /* NULLFS_DEBUG */
 
 #endif /* _KERNEL */
+
+#endif
diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c
index b2c7a75..f82d738 100644
--- a/sys/fs/nullfs/null_subr.c
+++ b/sys/fs/nullfs/null_subr.c
@@ -224,6 +224,9 @@ null_nodeget(mp, lowervp, vpp)
         * provide ready to use vnode.
         */
        if (VOP_ISLOCKED(lowervp) != LK_EXCLUSIVE) {
+               KASSERT((MOUNTTONULLMOUNT(mp)->nullm_flags & NULLM_CACHE) == 0,
+                   ("lowervp %p is not excl locked and cache is disabled",
+                   lowervp));
                vn_lock(lowervp, LK_UPGRADE | LK_RETRY);
                if ((lowervp->v_iflag & VI_DOOMED) != 0) {
                        vput(lowervp);
diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
index 7d84d51..8a5f1b9 100644
--- a/sys/fs/nullfs/null_vfsops.c
+++ b/sys/fs/nullfs/null_vfsops.c
@@ -67,6 +67,13 @@ static vfs_vget_t    nullfs_vget;
 static vfs_extattrctl_t        nullfs_extattrctl;
 static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp;
 
+/* Mount options that we support. */
+static const char *nullfs_opts[] = {
+       "target",
+       "cache",
+       NULL
+};
+
 /*
  * Mount null layer
  */
@@ -86,9 +93,11 @@ nullfs_mount(struct mount *mp)
 
        if (!prison_allow(td->td_ucred, PR_ALLOW_MOUNT_NULLFS))
                return (EPERM);
-
        if (mp->mnt_flag & MNT_ROOTFS)
                return (EOPNOTSUPP);
+       if (vfs_filteropt(mp->mnt_optnew, nullfs_opts))
+               return (EINVAL);
+
        /*
         * Update is a no-op
         */
@@ -149,7 +158,7 @@ nullfs_mount(struct mount *mp)
        }
 
        xmp = (struct null_mount *) malloc(sizeof(struct null_mount),
-           M_NULLFSMNT, M_WAITOK);
+           M_NULLFSMNT, M_WAITOK | M_ZERO);
 
        /*
         * Save reference to underlying FS
@@ -187,16 +196,25 @@ nullfs_mount(struct mount *mp)
                mp->mnt_flag |= MNT_LOCAL;
                MNT_IUNLOCK(mp);
        }
+
+       vfs_flagopt(mp->mnt_optnew, "cache", &xmp->nullm_flags, NULLM_CACHE);
+
        MNT_ILOCK(mp);
-       mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag &
-           (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | MNTK_EXTENDED_SHARED);
+       if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
+               mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag &
+                   (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
+                   MNTK_EXTENDED_SHARED);
+       }
        mp->mnt_kern_flag |= MNTK_LOOKUP_EXCL_DOTDOT;
        MNT_IUNLOCK(mp);
        mp->mnt_data = xmp;
        vfs_getnewfsid(mp);
-       MNT_ILOCK(xmp->nullm_vfs);
-       TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp, mnt_upper_link);
-       MNT_IUNLOCK(xmp->nullm_vfs);
+       if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
+               MNT_ILOCK(xmp->nullm_vfs);
+               TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp,
+                   mnt_upper_link);
+               MNT_IUNLOCK(xmp->nullm_vfs);
+       }
 
        vfs_mountedfrom(mp, target);
 
@@ -234,13 +252,15 @@ nullfs_unmount(mp, mntflags)
         */
        mntdata = mp->mnt_data;
        ump = mntdata->nullm_vfs;
-       MNT_ILOCK(ump);
-       while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) != 0) {
-               ump->mnt_kern_flag |= MNTK_VGONE_WAITER;
-               msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0);
+       if ((mntdata->nullm_flags & NULLM_CACHE) != 0) {
+               MNT_ILOCK(ump);
+               while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) != 0) {
+                       ump->mnt_kern_flag |= MNTK_VGONE_WAITER;
+                       msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0);
+               }
+               TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link);
+               MNT_IUNLOCK(ump);
        }
-       TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link);
-       MNT_IUNLOCK(ump);
        mp->mnt_data = NULL;
        free(mntdata, M_NULLFSMNT);
        return (0);
diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c
index f530ed2..cc35d81 100644
--- a/sys/fs/nullfs/null_vnops.c
+++ b/sys/fs/nullfs/null_vnops.c
@@ -692,7 +692,22 @@ null_unlock(struct vop_unlock_args *ap)
 static int
 null_inactive(struct vop_inactive_args *ap __unused)
 {
+       struct vnode *vp;
+       struct mount *mp;
+       struct null_mount *xmp;
 
+       vp = ap->a_vp;
+       mp = vp->v_mount;
+       xmp = MOUNTTONULLMOUNT(mp);
+       if ((xmp->nullm_flags & NULLM_CACHE) == 0) {
+               /*
+                * If this is the last reference and caching of the
+                * nullfs vnodes is not enabled, then free up the
+                * vnode so as not to tie up the lower vnodes.
+                */
+               vp->v_object = NULL;
+               vrecycle(vp);
+       }
        return (0);
 }
 

Attachment: pgp6OX4zBod3D.pgp
Description: PGP signature

Reply via email to