On Mon, Jan 16, 2017 at 07:34:43PM +0000, Alexander Bluhm wrote:
> On Mon, Jan 09, 2017 at 11:55:55PM +0100, Alexander Bluhm wrote:
> > On Thu, Dec 22, 2016 at 01:38:17AM +0100, Mateusz Guzik wrote:
> > > In this particular case, what happens if the access results in a page
> > > fault and the area comes from a nfs mapped file? If network i/o is done
> > > from the same context, this should result in 'locking against myself'
> > > assertion failure.
> > 
> > I have written a program the sets a sysctl value from a memory
> > mapped file mounted on NFS.  As expected it panics when NET_LOCK()
> > is enabled.
> 
> I was wondering why my test program did only crash during copyin
> but never for copyout.  For sysctl(2) the copyout does never sleep
> as the memory is wired.  This is done to avoid races when collecting
> data for the oldp.

Once more, I'm not familiar with the codebase, only skimmed through so
apologies for mistakes (if any).

I think the current code is buggy and the approach needs to be adjusted
to work.

On a more SMP-ified kernel a concurrently running thread can munlock the
area or mmap something over it. Then vsunlock executed over it will
cause trouble. The same checks used during munlock have to be repeated.

Even the current kernel has the problem - since NET_LOCK is a sleepable
lock, the caller can be put off the cpu and the newly scheduled thread
can do aforementioned shenaningans. Likely there is a similar story for
other sysctl handlers.

With spurious vsunlock taken care off, we are back to recursive locking
due to the area not faulted in.

If you want to stick to holding NET_LOCK over copyin/copyout, I suggest
introducing _nofault variants - that is, instead of faulting the page
in, just return an error. The kernel did what it should by wiring
appropriate pages. If you play games by changing the same mappings, the
failure is on you. I.e. it will not affect legitimate programs.

However, this slows down the general sysctl interface for a corner case.
At the very least, a dedicated helper for wiring pages can be introduced
and it would be executed by sysctls interested in the facility.


> 
> If I implement the same trick for newp, I can avoid the "netlock
> locking against myself" with sysctl on memory mapped files over
> NFS.  Of course other copyin/copyout paths like pf(4) ioctl(2) still
> have to be checked.  IPsec pfkey seem to use the sysctl mechanism.
> 
> Note that the variable dolock was always 1, so I removed it.
> 
> ok?
> 
> bluhm
> 
> Index: kern/kern_sysctl.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v
> retrieving revision 1.320
> diff -u -p -r1.320 kern_sysctl.c
> --- kern/kern_sysctl.c        11 Nov 2016 18:59:09 -0000      1.320
> +++ kern/kern_sysctl.c        16 Jan 2017 18:49:20 -0000
> @@ -157,7 +157,7 @@ sys_sysctl(struct proc *p, void *v, regi
>               syscallarg(void *) new;
>               syscallarg(size_t) newlen;
>       } */ *uap = v;
> -     int error, dolock = 1;
> +     int error;
>       size_t savelen = 0, oldlen = 0;
>       sysctlfn *fn;
>       int name[CTL_MAXNAME];
> @@ -219,30 +219,41 @@ sys_sysctl(struct proc *p, void *v, regi
>       if (SCARG(uap, oldlenp) &&
>           (error = copyin(SCARG(uap, oldlenp), &oldlen, sizeof(oldlen))))
>               return (error);
> -     if (SCARG(uap, old) != NULL) {
> +     if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) {
>               if ((error = rw_enter(&sysctl_lock, RW_WRITE|RW_INTR)) != 0)
>                       return (error);
> -             if (dolock) {
> -                     if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) {
> -                             rw_exit_write(&sysctl_lock);
> -                             return (ENOMEM);
> -                     }
> -                     error = uvm_vslock(p, SCARG(uap, old), oldlen,
> -                         PROT_READ | PROT_WRITE);
> -                     if (error) {
> -                             rw_exit_write(&sysctl_lock);
> -                             return (error);
> -                     }
> +     }
> +     if (SCARG(uap, old) != NULL) {
> +             if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) {
> +                     error = ENOMEM;
> +                     goto unlock;
>               }
> +             error = uvm_vslock(p, SCARG(uap, old), oldlen,
> +                 PROT_READ | PROT_WRITE);
> +             if (error)
> +                     goto unlock;
>               savelen = oldlen;
>       }
> +     if (SCARG(uap, new) != NULL) {
> +             if (atop(SCARG(uap, newlen)) > uvmexp.wiredmax - uvmexp.wired) {
> +                     error = ENOMEM;
> +                     goto unwire;
> +             }
> +             error = uvm_vslock(p, SCARG(uap, new), SCARG(uap, newlen),
> +                 PROT_READ | PROT_WRITE);
> +             if (error)
> +                     goto unwire;
> +     }
>       error = (*fn)(&name[1], SCARG(uap, namelen) - 1, SCARG(uap, old),
>           &oldlen, SCARG(uap, new), SCARG(uap, newlen), p);
> -     if (SCARG(uap, old) != NULL) {
> -             if (dolock)
> -                     uvm_vsunlock(p, SCARG(uap, old), savelen);
> +     if (SCARG(uap, new) != NULL)
> +             uvm_vsunlock(p, SCARG(uap, new), SCARG(uap, newlen));
> + unwire:
> +     if (SCARG(uap, old) != NULL)
> +             uvm_vsunlock(p, SCARG(uap, old), savelen);
> + unlock:
> +     if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL)
>               rw_exit_write(&sysctl_lock);
> -     }
>       if (error)
>               return (error);
>       if (SCARG(uap, oldlenp))

-- 
Mateusz Guzik <mjguzik gmail.com>

Reply via email to