On Mon, May 01, 2017 at 06:02:24PM +0200, Mark Kettenis wrote: > The futex(2) syscall needs to be able to atomically copy the futex in > and out of userland. The current implementation uses copyin(9) and > copyout(9) for that. The futex is a 32-bit integer, and currently our > copyin(9) and copyout(9) don't guarantee an atomic 32-bit access. > Previously mpi@ and I discussed implementing new interfaces that do > guarantee the required atomicity. However, it oocurred to me that we > could simply change our copyin implementations such that they > guarantee atomicity of a properly aligned 32-bit copyin and copyout. > > The i386 version of these calls uses "rep movsl", which means it is > already atomic. At least that is how I interpret 8.2.4 in Volume 3A > of the Intel SDM. The diff below makes the amd64 version safe as > well. This does introduce a few additional instructions in the loop. > Apparently modern Intel CPUs optimize the string loops. If we can > rely on the hardware to turn 32-bit moves into 64-bit moves, we could > simplify the code by using "rep movsl" instead of "rep movsq".
I submit to this approach. OK visa@ > Index: arch/amd64/amd64/copy.S > =================================================================== > RCS file: /cvs/src/sys/arch/amd64/amd64/copy.S,v > retrieving revision 1.7 > diff -u -p -r1.7 copy.S > --- arch/amd64/amd64/copy.S 25 Apr 2015 21:31:24 -0000 1.7 > +++ arch/amd64/amd64/copy.S 1 May 2017 15:32:17 -0000 > @@ -138,7 +138,12 @@ ENTRY(copyout) > rep > movsq > movb %al,%cl > - andb $7,%cl > + shrb $2,%cl > + andb $1,%cl > + rep > + movsl > + movb %al,%cl > + andb $3,%cl > rep > movsb > SMAP_CLAC > @@ -168,7 +173,12 @@ ENTRY(copyin) > rep > movsq > movb %al,%cl > - andb $7,%cl > + shrb $2,%cl > + andb $1,%cl > + rep > + movsl > + movb %al,%cl > + andb $3,%cl > rep > movsb > >