On Thu, Jun 18, 2020 at 04:39:35PM +0000, David Laight wrote:
> From: Alexey Dobriyan 
> > Sent: 18 June 2020 14:17
> ...
> > > > diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
> > > > index fff28c6f73a2..b0dfac3d3df7 100644
> > > > --- a/arch/x86/lib/usercopy_64.c
> > > > +++ b/arch/x86/lib/usercopy_64.c
> > > > @@ -24,6 +24,7 @@ unsigned long __clear_user(void __user *addr, 
> > > > unsigned long size)
> > > >         asm volatile(
> > > >                 "       testq  %[size8],%[size8]\n"
> > > >                 "       jz     4f\n"
> > > > +               "       .align 16\n"
> > > >                 "0:     movq $0,(%[dst])\n"
> > > >                 "       addq   $8,%[dst]\n"
> > > >                 "       decl %%ecx ; jnz   0b\n"
> > >
> > > You can do better that that loop.
> > > Change 'dst' to point to the end of the buffer, negate the count
> > > and divide by 8 and you get:
> > >           "0:     movq $0,($[dst],%%ecx,8)\n"
> > >           "       add $1,%%ecx"
> > >           "       jnz 0b\n"
> > > which might run at one iteration per clock especially on cpu that pair
> > > the add and jnz into a single uop.
> > > (You need to use add not inc.)
> > 
> > /dev/zero should probably use REP STOSB etc just like everything else.
> 
> Almost certainly it shouldn't, and neither should anything else.
> Potentially it could use whatever memset() is patched to.
> That MIGHT be 'rep stos' on some cpu variants, but in general
> it is slow.

Yes, that's what I meant: alternatives choosing REP variant.
memset loops are so 21-st century.

Reply via email to