* Paolo Abeni <pab...@redhat.com> wrote:

> The 'rep' prefix suffers for a relevant "setup cost"; as a result
> string copies with unrolled loops are faster than even
> optimized string copy using 'rep' variant, for short string.
> 
> This change updates __copy_user_generic() to use the unrolled
> version for small string length. The threshold length for short
> string - 64 - has been selected with empirical measures as the
> larger value that still ensure a measurable gain.
> 
> A micro-benchmark of __copy_from_user() with different lengths shows
> the following:
> 
> string len    vanilla         patched         delta
> bytes         ticks           ticks           tick(%)
> 
> 0             58              26              32(55%)
> 1             49              29              20(40%)
> 2             49              31              18(36%)
> 3             49              32              17(34%)
> 4             50              34              16(32%)
> 5             49              35              14(28%)
> 6             49              36              13(26%)
> 7             49              38              11(22%)
> 8             50              31              19(38%)
> 9             51              33              18(35%)
> 10            52              36              16(30%)
> 11            52              37              15(28%)
> 12            52              38              14(26%)
> 13            52              40              12(23%)
> 14            52              41              11(21%)
> 15            52              42              10(19%)
> 16            51              34              17(33%)
> 17            51              35              16(31%)
> 18            52              37              15(28%)
> 19            51              38              13(25%)
> 20            52              39              13(25%)
> 21            52              40              12(23%)
> 22            51              42              9(17%)
> 23            51              46              5(9%)
> 24            52              35              17(32%)
> 25            52              37              15(28%)
> 26            52              38              14(26%)
> 27            52              39              13(25%)
> 28            52              40              12(23%)
> 29            53              42              11(20%)
> 30            52              43              9(17%)
> 31            52              44              8(15%)
> 32            51              36              15(29%)
> 33            51              38              13(25%)
> 34            51              39              12(23%)
> 35            51              41              10(19%)
> 36            52              41              11(21%)
> 37            52              43              9(17%)
> 38            51              44              7(13%)
> 39            52              46              6(11%)
> 40            51              37              14(27%)
> 41            50              38              12(24%)
> 42            50              39              11(22%)
> 43            50              40              10(20%)
> 44            50              42              8(16%)
> 45            50              43              7(14%)
> 46            50              43              7(14%)
> 47            50              45              5(10%)
> 48            50              37              13(26%)
> 49            49              38              11(22%)
> 50            50              40              10(20%)
> 51            50              42              8(16%)
> 52            50              42              8(16%)
> 53            49              46              3(6%)
> 54            50              46              4(8%)
> 55            49              48              1(2%)
> 56            50              39              11(22%)
> 57            50              40              10(20%)
> 58            49              42              7(14%)
> 59            50              42              8(16%)
> 60            50              46              4(8%)
> 61            50              47              3(6%)
> 62            50              48              2(4%)
> 63            50              48              2(4%)
> 64            51              38              13(25%)
> 
> Above 64 bytes the gain fades away.
> 
> Very similar values are collectd for __copy_to_user().
> UDP receive performances under flood with small packets using recvfrom()
> increase by ~5%.

What CPU model(s) were used for the performance testing and was it performance 
tested on several different types of CPUs?

Please add a comment here:

+       if (len <= 64)
+               return copy_user_generic_unrolled(to, from, len);
+

... because it's not obvious at all that this is a performance optimization, 
not a 
correctness issue. Also explain that '64' is a number that we got from 
performance 
measurements.

But in general I like the change - as long as it was measured on reasonably 
modern 
x86 CPUs. I.e. it should not regress on modern Intel or AMD CPUs.

Thanks,

        Ingo

Reply via email to