> Date: Fri, 12 May 2017 13:34:38 +0000
> From: Visa Hankala <v...@openbsd.org>
> 
> On Mon, May 01, 2017 at 06:02:24PM +0200, Mark Kettenis wrote:
> > The futex(2) syscall needs to be able to atomically copy the futex in
> > and out of userland.  The current implementation uses copyin(9) and
> > copyout(9) for that.  The futex is a 32-bit integer, and currently our
> > copyin(9) and copyout(9) don't guarantee an atomic 32-bit access.
> > Previously mpi@ and I discussed implementing new interfaces that do
> > guarantee the required atomicity.  However, it oocurred to me that we
> > could simply change our copyin implementations such that they
> > guarantee atomicity of a properly aligned 32-bit copyin and copyout.
> > 
> > The i386 version of these calls uses "rep movsl", which means it is
> > already atomic.  At least that is how I interpret 8.2.4 in Volume 3A
> > of the Intel SDM.  The diff below makes the amd64 version safe as
> > well.  This does introduce a few additional instructions in the loop.
> > Apparently modern Intel CPUs optimize the string loops.  If we can
> > rely on the hardware to turn 32-bit moves into 64-bit moves, we could
> > simplify the code by using "rep movsl" instead of "rep movsq".
> 
> I submit to this approach. OK visa@

So I'm partly backtracking here...

I tried to convert a few more architectures, and things get a bit
messier than I envisioned on architectures that have an optimized copy
function and care about alignment.  Especially when copyin(9) is
implemented in assembly.  So I implemented a new function called
copyin_futex(9), which is all we really need.  The implementations for
powerpc and sparc64 below have been (lightly) tested.  The regression
test that mpi@ created passes and doesn't crash the kernel!  I have an
(untested) implementation for hppa as well.  The amd64 and i386
implementation can simply be a wrapper around copyin(9).  The same
holds for all out non-MULTIPROCESSOR architectures.  So my idea is to
let kern/sys_futex.c provide that wrapper if MULTIPROCESSOR isn't
defined and add an #ifdef MULTIPROCESSOR around all MD copyin_futex()
implementations.

Thoughts?


Index: arch/powerpc/powerpc/pmap.c
===================================================================
RCS file: /cvs/src/sys/arch/powerpc/powerpc/pmap.c,v
retrieving revision 1.166
diff -u -p -r1.166 pmap.c
--- arch/powerpc/powerpc/pmap.c 19 Oct 2016 08:28:20 -0000      1.166
+++ arch/powerpc/powerpc/pmap.c 14 May 2017 19:39:54 -0000
@@ -1792,6 +1802,30 @@ copyout(const void *kaddr, void *udaddr,
                kaddr += l;
                len -= l;
        }
+       curpcb->pcb_onfault = oldh;
+       return 0;
+}
+
+int
+copyin_futex(const uint32_t *udaddr, uint32_t *kaddr)
+{
+       volatile uint32_t *p;
+       u_int32_t oldsr;
+       faultbuf env;
+       void *oldh = curpcb->pcb_onfault;
+
+       if ((u_int)udaddr & 0x3)
+               return EFAULT;
+
+       p = PPC_USER_ADDR + ((u_int)udaddr & ~PPC_SEGMENT_MASK);
+       oldsr = pmap_setusr(curpcb->pcb_pm, (vaddr_t)udaddr);
+       if (setfault(&env)) {
+               pmap_popusr(oldsr);
+               curpcb->pcb_onfault = oldh;
+               return EFAULT;
+       }
+       *kaddr = *p;
+       pmap_popusr(oldsr);
        curpcb->pcb_onfault = oldh;
        return 0;
 }
Index: arch/sparc64/sparc64/locore.s
===================================================================
RCS file: /cvs/src/sys/arch/sparc64/sparc64/locore.s,v
retrieving revision 1.185
diff -u -p -r1.185 locore.s
--- arch/sparc64/sparc64/locore.s       30 Apr 2017 16:45:45 -0000      1.185
+++ arch/sparc64/sparc64/locore.s       14 May 2017 19:39:54 -0000
@@ -5968,6 +5968,21 @@ Lcopyout_done:
        retl                    ! New instr
         clr    %o0                     ! return 0
 
+ENTRY(copyin_futex)
+       andcc   %o0, 0x3, %g0
+       bnz,pn  %xcc, Lcopyfault
+        nop
+       GET_CPCB(%o3)
+       set     Lcopyfault, %o4
+       membar  #Sync
+       stx     %o4, [%o3 + PCB_ONFAULT]
+       lduwa   [%o0] ASI_AIUS, %o2
+       stw     %o2, [%o1]
+       membar  #Sync
+       stx     %g0, [%o3 + PCB_ONFAULT]
+       retl
+        clr    %o0
+
 ! Copyin or copyout fault.  Clear cpcb->pcb_onfault and return EFAULT.
 ! Note that although we were in bcopy, there is no state to clean up;
 ! the only special thing is that we have to return to [g7 + 8] rather than
Index: kern/sys_futex.c
===================================================================
RCS file: /cvs/src/sys/kern/sys_futex.c,v
retrieving revision 1.2
diff -u -p -r1.2 sys_futex.c
--- kern/sys_futex.c    30 Apr 2017 10:10:21 -0000      1.2
+++ kern/sys_futex.c    14 May 2017 19:39:54 -0000
@@ -31,6 +31,8 @@
 #include <sys/ktrace.h>
 #endif
 
+int    copyin_futex(const uint32_t *, uint32_t *);
+
 /*
  * Kernel representation of a futex.
  */
@@ -187,10 +189,8 @@ futex_wait(uint32_t *uaddr, uint32_t val
 
        /*
         * Read user space futex value
-        *
-        * XXX copyin(9) is not guaranteed to be atomic.
         */
-       if ((error = copyin(uaddr, &cval, sizeof(cval))))
+       if ((error = copyin_futex(uaddr, &cval)))
                return error;
 
        /* If the value changed, stop here. */

Reply via email to