Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Paolo Abeni
On Thu, 2017-06-22 at 10:30 -0700, Linus Torvalds wrote:
> So if you want to do this optimization, I'd argue that you should just
> do it inside the copy_user_enhanced_fast_string() function itself, the
> same way we already handle the really small case specially in
> copy_user_generic_string().
> 
> And do *not* use the unrolled code, which isn't used for small copies
> anyway - rewrite the "copy_user_generic_unrolled" function in that
> same asm file to have the non-unrolled cases (label "17" and forward)
> accessible, so that you don't bother re-testing the size.

Thank you for the feedback.

I'm quite new to the core x86 land; the rep stosb cost popped out while
messing with the networking. I'll try to dig into the asm.

Regards,

Paolo


Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Paolo Abeni
On Thu, 2017-06-22 at 10:30 -0700, Linus Torvalds wrote:
> So if you want to do this optimization, I'd argue that you should just
> do it inside the copy_user_enhanced_fast_string() function itself, the
> same way we already handle the really small case specially in
> copy_user_generic_string().
> 
> And do *not* use the unrolled code, which isn't used for small copies
> anyway - rewrite the "copy_user_generic_unrolled" function in that
> same asm file to have the non-unrolled cases (label "17" and forward)
> accessible, so that you don't bother re-testing the size.

Thank you for the feedback.

I'm quite new to the core x86 land; the rep stosb cost popped out while
messing with the networking. I'll try to dig into the asm.

Regards,

Paolo


Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Linus Torvalds
On Wed, Jun 21, 2017 at 4:09 AM, Paolo Abeni  wrote:
>
> +   if (len <= 64)
> +   return copy_user_generic_unrolled(to, from, len);
> +
> /*
>  * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
>  * Otherwise, if CPU has rep_good feature, use 
> copy_user_generic_string.

NAK. Please do *not* do this. It puts the check in completely the
wrong place for several reasons:

 (a) it puts it in the inlined caller case (which could be ok for
constant sizes, but not in general)

 (b) it uses the copy_user_generic_unrolled() function that will then
just test the size *AGAIN* against small cases.

so it's both bigger than necessary, and stupid.

So if you want to do this optimization, I'd argue that you should just
do it inside the copy_user_enhanced_fast_string() function itself, the
same way we already handle the really small case specially in
copy_user_generic_string().

And do *not* use the unrolled code, which isn't used for small copies
anyway - rewrite the "copy_user_generic_unrolled" function in that
same asm file to have the non-unrolled cases (label "17" and forward)
accessible, so that you don't bother re-testing the size.

 Linus


Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Linus Torvalds
On Wed, Jun 21, 2017 at 4:09 AM, Paolo Abeni  wrote:
>
> +   if (len <= 64)
> +   return copy_user_generic_unrolled(to, from, len);
> +
> /*
>  * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
>  * Otherwise, if CPU has rep_good feature, use 
> copy_user_generic_string.

NAK. Please do *not* do this. It puts the check in completely the
wrong place for several reasons:

 (a) it puts it in the inlined caller case (which could be ok for
constant sizes, but not in general)

 (b) it uses the copy_user_generic_unrolled() function that will then
just test the size *AGAIN* against small cases.

so it's both bigger than necessary, and stupid.

So if you want to do this optimization, I'd argue that you should just
do it inside the copy_user_enhanced_fast_string() function itself, the
same way we already handle the really small case specially in
copy_user_generic_string().

And do *not* use the unrolled code, which isn't used for small copies
anyway - rewrite the "copy_user_generic_unrolled" function in that
same asm file to have the non-unrolled cases (label "17" and forward)
accessible, so that you don't bother re-testing the size.

 Linus


Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Paolo Abeni
On Thu, 2017-06-22 at 10:47 +0200, Ingo Molnar wrote:
> * Paolo Abeni  wrote:
> 
> > The 'rep' prefix suffers for a relevant "setup cost"; as a result
> > string copies with unrolled loops are faster than even
> > optimized string copy using 'rep' variant, for short string.
> > 
> > This change updates __copy_user_generic() to use the unrolled
> > version for small string length. The threshold length for short
> > string - 64 - has been selected with empirical measures as the
> > larger value that still ensure a measurable gain.
> > 
> > A micro-benchmark of __copy_from_user() with different lengths shows
> > the following:
> > 
> > string len  vanilla patched delta
> > bytes   ticks   ticks   tick(%)
> > 
> > 0   58  26  32(55%)
> > 1   49  29  20(40%)
> > 2   49  31  18(36%)
> > 3   49  32  17(34%)
> > 4   50  34  16(32%)
> > 5   49  35  14(28%)
> > 6   49  36  13(26%)
> > 7   49  38  11(22%)
> > 8   50  31  19(38%)
> > 9   51  33  18(35%)
> > 10  52  36  16(30%)
> > 11  52  37  15(28%)
> > 12  52  38  14(26%)
> > 13  52  40  12(23%)
> > 14  52  41  11(21%)
> > 15  52  42  10(19%)
> > 16  51  34  17(33%)
> > 17  51  35  16(31%)
> > 18  52  37  15(28%)
> > 19  51  38  13(25%)
> > 20  52  39  13(25%)
> > 21  52  40  12(23%)
> > 22  51  42  9(17%)
> > 23  51  46  5(9%)
> > 24  52  35  17(32%)
> > 25  52  37  15(28%)
> > 26  52  38  14(26%)
> > 27  52  39  13(25%)
> > 28  52  40  12(23%)
> > 29  53  42  11(20%)
> > 30  52  43  9(17%)
> > 31  52  44  8(15%)
> > 32  51  36  15(29%)
> > 33  51  38  13(25%)
> > 34  51  39  12(23%)
> > 35  51  41  10(19%)
> > 36  52  41  11(21%)
> > 37  52  43  9(17%)
> > 38  51  44  7(13%)
> > 39  52  46  6(11%)
> > 40  51  37  14(27%)
> > 41  50  38  12(24%)
> > 42  50  39  11(22%)
> > 43  50  40  10(20%)
> > 44  50  42  8(16%)
> > 45  50  43  7(14%)
> > 46  50  43  7(14%)
> > 47  50  45  5(10%)
> > 48  50  37  13(26%)
> > 49  49  38  11(22%)
> > 50  50  40  10(20%)
> > 51  50  42  8(16%)
> > 52  50  42  8(16%)
> > 53  49  46  3(6%)
> > 54  50  46  4(8%)
> > 55  49  48  1(2%)
> > 56  50  39  11(22%)
> > 57  50  40  10(20%)
> > 58  49  42  7(14%)
> > 59  50  42  8(16%)
> > 60  50  46  4(8%)
> > 61  50  47  3(6%)
> > 62  50  48  2(4%)
> > 63  50  48  2(4%)
> > 64  51  38  13(25%)
> > 
> > Above 64 bytes the gain fades away.
> > 
> > Very similar values are collectd for __copy_to_user().
> > UDP receive performances under flood with small packets using recvfrom()
> > increase by ~5%.
> 
> What CPU model(s) were used for the performance testing and was it 
> performance 
> tested on several different types of CPUs?
> 
> Please add a comment here:
> 
> +   if (len <= 64)
> +   return copy_user_generic_unrolled(to, from, len);
> +
> 
> ... because it's not obvious at all that this is a performance optimization, 
> not a 
> correctness issue. Also explain that '64' 

Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Paolo Abeni
On Thu, 2017-06-22 at 10:47 +0200, Ingo Molnar wrote:
> * Paolo Abeni  wrote:
> 
> > The 'rep' prefix suffers for a relevant "setup cost"; as a result
> > string copies with unrolled loops are faster than even
> > optimized string copy using 'rep' variant, for short string.
> > 
> > This change updates __copy_user_generic() to use the unrolled
> > version for small string length. The threshold length for short
> > string - 64 - has been selected with empirical measures as the
> > larger value that still ensure a measurable gain.
> > 
> > A micro-benchmark of __copy_from_user() with different lengths shows
> > the following:
> > 
> > string len  vanilla patched delta
> > bytes   ticks   ticks   tick(%)
> > 
> > 0   58  26  32(55%)
> > 1   49  29  20(40%)
> > 2   49  31  18(36%)
> > 3   49  32  17(34%)
> > 4   50  34  16(32%)
> > 5   49  35  14(28%)
> > 6   49  36  13(26%)
> > 7   49  38  11(22%)
> > 8   50  31  19(38%)
> > 9   51  33  18(35%)
> > 10  52  36  16(30%)
> > 11  52  37  15(28%)
> > 12  52  38  14(26%)
> > 13  52  40  12(23%)
> > 14  52  41  11(21%)
> > 15  52  42  10(19%)
> > 16  51  34  17(33%)
> > 17  51  35  16(31%)
> > 18  52  37  15(28%)
> > 19  51  38  13(25%)
> > 20  52  39  13(25%)
> > 21  52  40  12(23%)
> > 22  51  42  9(17%)
> > 23  51  46  5(9%)
> > 24  52  35  17(32%)
> > 25  52  37  15(28%)
> > 26  52  38  14(26%)
> > 27  52  39  13(25%)
> > 28  52  40  12(23%)
> > 29  53  42  11(20%)
> > 30  52  43  9(17%)
> > 31  52  44  8(15%)
> > 32  51  36  15(29%)
> > 33  51  38  13(25%)
> > 34  51  39  12(23%)
> > 35  51  41  10(19%)
> > 36  52  41  11(21%)
> > 37  52  43  9(17%)
> > 38  51  44  7(13%)
> > 39  52  46  6(11%)
> > 40  51  37  14(27%)
> > 41  50  38  12(24%)
> > 42  50  39  11(22%)
> > 43  50  40  10(20%)
> > 44  50  42  8(16%)
> > 45  50  43  7(14%)
> > 46  50  43  7(14%)
> > 47  50  45  5(10%)
> > 48  50  37  13(26%)
> > 49  49  38  11(22%)
> > 50  50  40  10(20%)
> > 51  50  42  8(16%)
> > 52  50  42  8(16%)
> > 53  49  46  3(6%)
> > 54  50  46  4(8%)
> > 55  49  48  1(2%)
> > 56  50  39  11(22%)
> > 57  50  40  10(20%)
> > 58  49  42  7(14%)
> > 59  50  42  8(16%)
> > 60  50  46  4(8%)
> > 61  50  47  3(6%)
> > 62  50  48  2(4%)
> > 63  50  48  2(4%)
> > 64  51  38  13(25%)
> > 
> > Above 64 bytes the gain fades away.
> > 
> > Very similar values are collectd for __copy_to_user().
> > UDP receive performances under flood with small packets using recvfrom()
> > increase by ~5%.
> 
> What CPU model(s) were used for the performance testing and was it 
> performance 
> tested on several different types of CPUs?
> 
> Please add a comment here:
> 
> +   if (len <= 64)
> +   return copy_user_generic_unrolled(to, from, len);
> +
> 
> ... because it's not obvious at all that this is a performance optimization, 
> not a 
> correctness issue. Also explain that '64' is a number that we 

Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Alan Cox
> > diff --git a/arch/x86/include/asm/uaccess_64.h 
> > b/arch/x86/include/asm/uaccess_64.h
> > index c5504b9..16a8871 100644
> > --- a/arch/x86/include/asm/uaccess_64.h
> > +++ b/arch/x86/include/asm/uaccess_64.h
> > @@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned 
> > len)
> >  {
> > unsigned ret;
> >
> > +   if (len <= 64)
> > +   return copy_user_generic_unrolled(to, from, len);
> > +

Given this gets inlined surely that should bracketed with an ifdef
against optimizing for space.

Also you give one set of benchmarks but how do they look on different
processors - AMD v Intel, Atom v Core, Ryzen v older AMD etc ?

Alan


Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Alan Cox
> > diff --git a/arch/x86/include/asm/uaccess_64.h 
> > b/arch/x86/include/asm/uaccess_64.h
> > index c5504b9..16a8871 100644
> > --- a/arch/x86/include/asm/uaccess_64.h
> > +++ b/arch/x86/include/asm/uaccess_64.h
> > @@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned 
> > len)
> >  {
> > unsigned ret;
> >
> > +   if (len <= 64)
> > +   return copy_user_generic_unrolled(to, from, len);
> > +

Given this gets inlined surely that should bracketed with an ifdef
against optimizing for space.

Also you give one set of benchmarks but how do they look on different
processors - AMD v Intel, Atom v Core, Ryzen v older AMD etc ?

Alan


Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Ingo Molnar

* Paolo Abeni  wrote:

> The 'rep' prefix suffers for a relevant "setup cost"; as a result
> string copies with unrolled loops are faster than even
> optimized string copy using 'rep' variant, for short string.
> 
> This change updates __copy_user_generic() to use the unrolled
> version for small string length. The threshold length for short
> string - 64 - has been selected with empirical measures as the
> larger value that still ensure a measurable gain.
> 
> A micro-benchmark of __copy_from_user() with different lengths shows
> the following:
> 
> string lenvanilla patched delta
> bytes ticks   ticks   tick(%)
> 
> 0 58  26  32(55%)
> 1 49  29  20(40%)
> 2 49  31  18(36%)
> 3 49  32  17(34%)
> 4 50  34  16(32%)
> 5 49  35  14(28%)
> 6 49  36  13(26%)
> 7 49  38  11(22%)
> 8 50  31  19(38%)
> 9 51  33  18(35%)
> 1052  36  16(30%)
> 1152  37  15(28%)
> 1252  38  14(26%)
> 1352  40  12(23%)
> 1452  41  11(21%)
> 1552  42  10(19%)
> 1651  34  17(33%)
> 1751  35  16(31%)
> 1852  37  15(28%)
> 1951  38  13(25%)
> 2052  39  13(25%)
> 2152  40  12(23%)
> 2251  42  9(17%)
> 2351  46  5(9%)
> 2452  35  17(32%)
> 2552  37  15(28%)
> 2652  38  14(26%)
> 2752  39  13(25%)
> 2852  40  12(23%)
> 2953  42  11(20%)
> 3052  43  9(17%)
> 3152  44  8(15%)
> 3251  36  15(29%)
> 3351  38  13(25%)
> 3451  39  12(23%)
> 3551  41  10(19%)
> 3652  41  11(21%)
> 3752  43  9(17%)
> 3851  44  7(13%)
> 3952  46  6(11%)
> 4051  37  14(27%)
> 4150  38  12(24%)
> 4250  39  11(22%)
> 4350  40  10(20%)
> 4450  42  8(16%)
> 4550  43  7(14%)
> 4650  43  7(14%)
> 4750  45  5(10%)
> 4850  37  13(26%)
> 4949  38  11(22%)
> 5050  40  10(20%)
> 5150  42  8(16%)
> 5250  42  8(16%)
> 5349  46  3(6%)
> 5450  46  4(8%)
> 5549  48  1(2%)
> 5650  39  11(22%)
> 5750  40  10(20%)
> 5849  42  7(14%)
> 5950  42  8(16%)
> 6050  46  4(8%)
> 6150  47  3(6%)
> 6250  48  2(4%)
> 6350  48  2(4%)
> 6451  38  13(25%)
> 
> Above 64 bytes the gain fades away.
> 
> Very similar values are collectd for __copy_to_user().
> UDP receive performances under flood with small packets using recvfrom()
> increase by ~5%.

What CPU model(s) were used for the performance testing and was it performance 
tested on several different types of CPUs?

Please add a comment here:

+   if (len <= 64)
+   return copy_user_generic_unrolled(to, from, len);
+

... because it's not obvious at all that this is a performance optimization, 
not a 
correctness issue. Also explain that '64' is a number that we got from 
performance 
measurements.

But in general I like the change - as long as it was measured on reasonably 

Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Ingo Molnar

* Paolo Abeni  wrote:

> The 'rep' prefix suffers for a relevant "setup cost"; as a result
> string copies with unrolled loops are faster than even
> optimized string copy using 'rep' variant, for short string.
> 
> This change updates __copy_user_generic() to use the unrolled
> version for small string length. The threshold length for short
> string - 64 - has been selected with empirical measures as the
> larger value that still ensure a measurable gain.
> 
> A micro-benchmark of __copy_from_user() with different lengths shows
> the following:
> 
> string lenvanilla patched delta
> bytes ticks   ticks   tick(%)
> 
> 0 58  26  32(55%)
> 1 49  29  20(40%)
> 2 49  31  18(36%)
> 3 49  32  17(34%)
> 4 50  34  16(32%)
> 5 49  35  14(28%)
> 6 49  36  13(26%)
> 7 49  38  11(22%)
> 8 50  31  19(38%)
> 9 51  33  18(35%)
> 1052  36  16(30%)
> 1152  37  15(28%)
> 1252  38  14(26%)
> 1352  40  12(23%)
> 1452  41  11(21%)
> 1552  42  10(19%)
> 1651  34  17(33%)
> 1751  35  16(31%)
> 1852  37  15(28%)
> 1951  38  13(25%)
> 2052  39  13(25%)
> 2152  40  12(23%)
> 2251  42  9(17%)
> 2351  46  5(9%)
> 2452  35  17(32%)
> 2552  37  15(28%)
> 2652  38  14(26%)
> 2752  39  13(25%)
> 2852  40  12(23%)
> 2953  42  11(20%)
> 3052  43  9(17%)
> 3152  44  8(15%)
> 3251  36  15(29%)
> 3351  38  13(25%)
> 3451  39  12(23%)
> 3551  41  10(19%)
> 3652  41  11(21%)
> 3752  43  9(17%)
> 3851  44  7(13%)
> 3952  46  6(11%)
> 4051  37  14(27%)
> 4150  38  12(24%)
> 4250  39  11(22%)
> 4350  40  10(20%)
> 4450  42  8(16%)
> 4550  43  7(14%)
> 4650  43  7(14%)
> 4750  45  5(10%)
> 4850  37  13(26%)
> 4949  38  11(22%)
> 5050  40  10(20%)
> 5150  42  8(16%)
> 5250  42  8(16%)
> 5349  46  3(6%)
> 5450  46  4(8%)
> 5549  48  1(2%)
> 5650  39  11(22%)
> 5750  40  10(20%)
> 5849  42  7(14%)
> 5950  42  8(16%)
> 6050  46  4(8%)
> 6150  47  3(6%)
> 6250  48  2(4%)
> 6350  48  2(4%)
> 6451  38  13(25%)
> 
> Above 64 bytes the gain fades away.
> 
> Very similar values are collectd for __copy_to_user().
> UDP receive performances under flood with small packets using recvfrom()
> increase by ~5%.

What CPU model(s) were used for the performance testing and was it performance 
tested on several different types of CPUs?

Please add a comment here:

+   if (len <= 64)
+   return copy_user_generic_unrolled(to, from, len);
+

... because it's not obvious at all that this is a performance optimization, 
not a 
correctness issue. Also explain that '64' is a number that we got from 
performance 
measurements.

But in general I like the change - as long as it was measured on reasonably 
modern 
x86 CPUs. 

Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-21 Thread Kees Cook
On Wed, Jun 21, 2017 at 4:09 AM, Paolo Abeni  wrote:
> The 'rep' prefix suffers for a relevant "setup cost"; as a result
> string copies with unrolled loops are faster than even
> optimized string copy using 'rep' variant, for short string.
>
> This change updates __copy_user_generic() to use the unrolled
> version for small string length. The threshold length for short
> string - 64 - has been selected with empirical measures as the
> larger value that still ensure a measurable gain.
>
> A micro-benchmark of __copy_from_user() with different lengths shows
> the following:
>
> string len  vanilla patched delta
> bytes   ticks   ticks   tick(%)
>
> 0   58  26  32(55%)
> 1   49  29  20(40%)
> 2   49  31  18(36%)
> 3   49  32  17(34%)
> 4   50  34  16(32%)
> 5   49  35  14(28%)
> 6   49  36  13(26%)
> 7   49  38  11(22%)
> 8   50  31  19(38%)
> 9   51  33  18(35%)
> 10  52  36  16(30%)
> 11  52  37  15(28%)
> 12  52  38  14(26%)
> 13  52  40  12(23%)
> 14  52  41  11(21%)
> 15  52  42  10(19%)
> 16  51  34  17(33%)
> 17  51  35  16(31%)
> 18  52  37  15(28%)
> 19  51  38  13(25%)
> 20  52  39  13(25%)
> 21  52  40  12(23%)
> 22  51  42  9(17%)
> 23  51  46  5(9%)
> 24  52  35  17(32%)
> 25  52  37  15(28%)
> 26  52  38  14(26%)
> 27  52  39  13(25%)
> 28  52  40  12(23%)
> 29  53  42  11(20%)
> 30  52  43  9(17%)
> 31  52  44  8(15%)
> 32  51  36  15(29%)
> 33  51  38  13(25%)
> 34  51  39  12(23%)
> 35  51  41  10(19%)
> 36  52  41  11(21%)
> 37  52  43  9(17%)
> 38  51  44  7(13%)
> 39  52  46  6(11%)
> 40  51  37  14(27%)
> 41  50  38  12(24%)
> 42  50  39  11(22%)
> 43  50  40  10(20%)
> 44  50  42  8(16%)
> 45  50  43  7(14%)
> 46  50  43  7(14%)
> 47  50  45  5(10%)
> 48  50  37  13(26%)
> 49  49  38  11(22%)
> 50  50  40  10(20%)
> 51  50  42  8(16%)
> 52  50  42  8(16%)
> 53  49  46  3(6%)
> 54  50  46  4(8%)
> 55  49  48  1(2%)
> 56  50  39  11(22%)
> 57  50  40  10(20%)
> 58  49  42  7(14%)
> 59  50  42  8(16%)
> 60  50  46  4(8%)
> 61  50  47  3(6%)
> 62  50  48  2(4%)
> 63  50  48  2(4%)
> 64  51  38  13(25%)
>
> Above 64 bytes the gain fades away.
>
> Very similar values are collectd for __copy_to_user().
> UDP receive performances under flood with small packets using recvfrom()
> increase by ~5%.
>
> Signed-off-by: Paolo Abeni 

Since there are no regressions here, this seems sensible to me. :)

Reviewed-by: Kees Cook 

-Kees

> ---
>  arch/x86/include/asm/uaccess_64.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/include/asm/uaccess_64.h 
> 

Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-21 Thread Kees Cook
On Wed, Jun 21, 2017 at 4:09 AM, Paolo Abeni  wrote:
> The 'rep' prefix suffers for a relevant "setup cost"; as a result
> string copies with unrolled loops are faster than even
> optimized string copy using 'rep' variant, for short string.
>
> This change updates __copy_user_generic() to use the unrolled
> version for small string length. The threshold length for short
> string - 64 - has been selected with empirical measures as the
> larger value that still ensure a measurable gain.
>
> A micro-benchmark of __copy_from_user() with different lengths shows
> the following:
>
> string len  vanilla patched delta
> bytes   ticks   ticks   tick(%)
>
> 0   58  26  32(55%)
> 1   49  29  20(40%)
> 2   49  31  18(36%)
> 3   49  32  17(34%)
> 4   50  34  16(32%)
> 5   49  35  14(28%)
> 6   49  36  13(26%)
> 7   49  38  11(22%)
> 8   50  31  19(38%)
> 9   51  33  18(35%)
> 10  52  36  16(30%)
> 11  52  37  15(28%)
> 12  52  38  14(26%)
> 13  52  40  12(23%)
> 14  52  41  11(21%)
> 15  52  42  10(19%)
> 16  51  34  17(33%)
> 17  51  35  16(31%)
> 18  52  37  15(28%)
> 19  51  38  13(25%)
> 20  52  39  13(25%)
> 21  52  40  12(23%)
> 22  51  42  9(17%)
> 23  51  46  5(9%)
> 24  52  35  17(32%)
> 25  52  37  15(28%)
> 26  52  38  14(26%)
> 27  52  39  13(25%)
> 28  52  40  12(23%)
> 29  53  42  11(20%)
> 30  52  43  9(17%)
> 31  52  44  8(15%)
> 32  51  36  15(29%)
> 33  51  38  13(25%)
> 34  51  39  12(23%)
> 35  51  41  10(19%)
> 36  52  41  11(21%)
> 37  52  43  9(17%)
> 38  51  44  7(13%)
> 39  52  46  6(11%)
> 40  51  37  14(27%)
> 41  50  38  12(24%)
> 42  50  39  11(22%)
> 43  50  40  10(20%)
> 44  50  42  8(16%)
> 45  50  43  7(14%)
> 46  50  43  7(14%)
> 47  50  45  5(10%)
> 48  50  37  13(26%)
> 49  49  38  11(22%)
> 50  50  40  10(20%)
> 51  50  42  8(16%)
> 52  50  42  8(16%)
> 53  49  46  3(6%)
> 54  50  46  4(8%)
> 55  49  48  1(2%)
> 56  50  39  11(22%)
> 57  50  40  10(20%)
> 58  49  42  7(14%)
> 59  50  42  8(16%)
> 60  50  46  4(8%)
> 61  50  47  3(6%)
> 62  50  48  2(4%)
> 63  50  48  2(4%)
> 64  51  38  13(25%)
>
> Above 64 bytes the gain fades away.
>
> Very similar values are collectd for __copy_to_user().
> UDP receive performances under flood with small packets using recvfrom()
> increase by ~5%.
>
> Signed-off-by: Paolo Abeni 

Since there are no regressions here, this seems sensible to me. :)

Reviewed-by: Kees Cook 

-Kees

> ---
>  arch/x86/include/asm/uaccess_64.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/include/asm/uaccess_64.h 
> b/arch/x86/include/asm/uaccess_64.h
> index c5504b9..16a8871 100644
> ---