To prevent unnecessary branching, mark the exit condition of the
primary loop as likely(), given that a carry in a 32-bit counter
occurs very rarely.

On arm64, the resulting code is emitted by GCC as

     9a8:   cmp     w1, #0x3
     9ac:   add     x3, x0, w1, uxtw
     9b0:   b.ls    9e0 <crypto_inc+0x38>
     9b4:   ldr     w2, [x3,#-4]!
     9b8:   rev     w2, w2
     9bc:   add     w2, w2, #0x1
     9c0:   rev     w4, w2
     9c4:   str     w4, [x3]
     9c8:   cbz     w2, 9d0 <crypto_inc+0x28>
     9cc:   ret

where the two remaining branch conditions (one for size < 4 and one for
the carry) are statically predicted as non-taken, resulting in optimal
execution in the vast majority of cases.

Also, replace the open coded alignment test with IS_ALIGNED().

Cc: Jason A. Donenfeld <ja...@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 crypto/algapi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index 6b52e8f0b95f..9eed4ef9c971 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -963,11 +963,11 @@ void crypto_inc(u8 *a, unsigned int size)
        u32 c;
 
        if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
-           !((unsigned long)b & (__alignof__(*b) - 1)))
+           IS_ALIGNED((unsigned long)b, __alignof__(*b)))
                for (; size >= 4; size -= 4) {
                        c = be32_to_cpu(*--b) + 1;
                        *b = cpu_to_be32(c);
-                       if (c)
+                       if (likely(c))
                                return;
                }
 
-- 
2.7.4

Reply via email to