On Mon, May 01, 2017 at 06:02:24PM +0200, Mark Kettenis wrote:
> The futex(2) syscall needs to be able to atomically copy the futex in
> and out of userland.  The current implementation uses copyin(9) and
> copyout(9) for that.  The futex is a 32-bit integer, and currently our
> copyin(9) and copyout(9) don't guarantee an atomic 32-bit access.
> Previously mpi@ and I discussed implementing new interfaces that do
> guarantee the required atomicity.  However, it oocurred to me that we
> could simply change our copyin implementations such that they
> guarantee atomicity of a properly aligned 32-bit copyin and copyout.
> 
> The i386 version of these calls uses "rep movsl", which means it is
> already atomic.  At least that is how I interpret 8.2.4 in Volume 3A
> of the Intel SDM.  The diff below makes the amd64 version safe as
> well.  This does introduce a few additional instructions in the loop.
> Apparently modern Intel CPUs optimize the string loops.  If we can
> rely on the hardware to turn 32-bit moves into 64-bit moves, we could
> simplify the code by using "rep movsl" instead of "rep movsq".

I submit to this approach. OK visa@

> Index: arch/amd64/amd64/copy.S
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/amd64/copy.S,v
> retrieving revision 1.7
> diff -u -p -r1.7 copy.S
> --- arch/amd64/amd64/copy.S   25 Apr 2015 21:31:24 -0000      1.7
> +++ arch/amd64/amd64/copy.S   1 May 2017 15:32:17 -0000
> @@ -138,7 +138,12 @@ ENTRY(copyout)
>       rep
>       movsq
>       movb    %al,%cl
> -     andb    $7,%cl
> +     shrb    $2,%cl
> +     andb    $1,%cl
> +     rep
> +     movsl
> +     movb    %al,%cl
> +     andb    $3,%cl
>       rep
>       movsb
>       SMAP_CLAC
> @@ -168,7 +173,12 @@ ENTRY(copyin)
>       rep
>       movsq
>       movb    %al,%cl
> -     andb    $7,%cl
> +     shrb    $2,%cl
> +     andb    $1,%cl
> +     rep
> +     movsl
> +     movb    %al,%cl
> +     andb    $3,%cl
>       rep
>       movsb
>  
> 

Reply via email to