The 'rep' prefix suffers for a relevant "setup cost"; as a result
string copies with unrolled loops are faster than even
optimized string copy using 'rep' variant, for short string.

This change updates __copy_user_generic() to use the unrolled
version for small string length. The threshold length for short
string - 64 - has been selected with empirical measures as the
larger value that still ensure a measurable gain.

A micro-benchmark of __copy_from_user() with different lengths shows
the following:

string len      vanilla         patched         delta
bytes           ticks           ticks           tick(%)

0               58              26              32(55%)
1               49              29              20(40%)
2               49              31              18(36%)
3               49              32              17(34%)
4               50              34              16(32%)
5               49              35              14(28%)
6               49              36              13(26%)
7               49              38              11(22%)
8               50              31              19(38%)
9               51              33              18(35%)
10              52              36              16(30%)
11              52              37              15(28%)
12              52              38              14(26%)
13              52              40              12(23%)
14              52              41              11(21%)
15              52              42              10(19%)
16              51              34              17(33%)
17              51              35              16(31%)
18              52              37              15(28%)
19              51              38              13(25%)
20              52              39              13(25%)
21              52              40              12(23%)
22              51              42              9(17%)
23              51              46              5(9%)
24              52              35              17(32%)
25              52              37              15(28%)
26              52              38              14(26%)
27              52              39              13(25%)
28              52              40              12(23%)
29              53              42              11(20%)
30              52              43              9(17%)
31              52              44              8(15%)
32              51              36              15(29%)
33              51              38              13(25%)
34              51              39              12(23%)
35              51              41              10(19%)
36              52              41              11(21%)
37              52              43              9(17%)
38              51              44              7(13%)
39              52              46              6(11%)
40              51              37              14(27%)
41              50              38              12(24%)
42              50              39              11(22%)
43              50              40              10(20%)
44              50              42              8(16%)
45              50              43              7(14%)
46              50              43              7(14%)
47              50              45              5(10%)
48              50              37              13(26%)
49              49              38              11(22%)
50              50              40              10(20%)
51              50              42              8(16%)
52              50              42              8(16%)
53              49              46              3(6%)
54              50              46              4(8%)
55              49              48              1(2%)
56              50              39              11(22%)
57              50              40              10(20%)
58              49              42              7(14%)
59              50              42              8(16%)
60              50              46              4(8%)
61              50              47              3(6%)
62              50              48              2(4%)
63              50              48              2(4%)
64              51              38              13(25%)

Above 64 bytes the gain fades away.

Very similar values are collectd for __copy_to_user().
UDP receive performances under flood with small packets using recvfrom()
increase by ~5%.

Signed-off-by: Paolo Abeni <pab...@redhat.com>
---
 arch/x86/include/asm/uaccess_64.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h 
b/arch/x86/include/asm/uaccess_64.h
index c5504b9..16a8871 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
 {
        unsigned ret;
 
+       if (len <= 64)
+               return copy_user_generic_unrolled(to, from, len);
+
        /*
         * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
         * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
-- 
2.9.4

Reply via email to