Hi all,

On Sat, Feb 22, 2020 at 07:43:18PM +0100, Michael Weiser wrote:

> > 2. Eliminate use of rev in the armbe code.
> ... I've been looking at the revs and they now strike me as taking the
> easy way out anyway. They work around the implicit LE order in which

Updated code is now at
https://git.lysator.liu.se/michaelweiser/nettle/-/tree/arm-memxor-generic
and inline below for comments.

It now compiles and runs the testsuite fine on my native armv7veb when
configurd with:

CFLAGS="-march=armv6" LDFLAGS="-march=armv6" \
        ../configure --disable-documentation \
                --host=armv6b-unknown-linux-gnueabihf
[...]
  Assembly files:    arm/neon arm/v6 arm

and:

CFLAGS="-march=armv5te" LDFLAGS="-march=armv5te -Wl,--be8" \
        ../configure --disable-documentation \
                --host=armv5b-unknown-linux-gnueabihf
[...]
  Assembly files:    arm

LDFLAGS "-Wl,--be8" is necessary for armv5teb to work on my system
because it is BE8 which the gcc linker driver defaults to when run with
-march=armv6 but not for armv5 which causes the resuling binaries to be
BE32 and segfault or bus error in ld-linux.so.3 on startup. For a
(likely wrong) explanation of BE8 vs. BE32 see
https://lists.lysator.liu.se/pipermail/nettle-bugs/2018/007059.html.

A quick check can be done with file:

$ echo "int main(void) {}" > t.c
$ gcc -march=armv5te -o t t.c
$ ./t
Segmentation fault
$ file t
t: ELF 32-bit MSB shared object, ARM, EABI5 version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux
3.2.0, not stripped
$ gcc -march=armv5te -Wl,--be8 -o t t.c
$ ./t
$ file t
t: ELF 32-bit MSB shared object, ARM, EABI5 BE8 version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux
3.2.0, not stripped

The qemu environment is churning along in compilation currently.

Previous assembler error for reference:

$ make -j4
[...]
/usr/bin/m4 ../asm.m4 machine.m4 config.m4 memxor.asm >memxor.s
/usr/bin/m4 ../asm.m4 machine.m4 config.m4 memxor3.asm >memxor3.s
gcc -I.  -DHAVE_CONFIG_H -march=armv5te -ggdb3 -Wall -W
-Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes
-Wpointer-arith -Wbad-function-cast -Wnested-externs -fpic -MT memxor.o
-MD -MP -MF memxor.o.d -c memxor.s
gcc -I.  -DHAVE_CONFIG_H -march=armv5te -ggdb3 -Wall -W
-Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes
-Wpointer-arith -Wbad-function-cast -Wnested-externs -fpic -MT memxor3.o
-MD -MP -MF memxor3.o.d -c memxor3.s
memxor.s: memxor3.s: Assembler messages:
memxor3.s:146: Error: selected processor does not support `rev r4,r4' in ARM 
mode
Assembler messages:
memxor3.s:256: Error: selected processor does not support `rev r4,r4' in ARM 
mode
memxor.s:126: Error: selected processor does not support `rev r3,r3' in ARM mode
-- 
Thanks,
Michael

From 3e2118d41472842c368bb5bb56d71023b861b59d Mon Sep 17 00:00:00 2001
From: Michael Weiser <[email protected]>
Date: Sun, 23 Feb 2020 15:22:51 +0100
Subject: [PATCH] arm: Fix memxor for non-armv6+ big-endian systems

ARM assembly adjustments for big-endian systems contained armv6+-only
instructions (rev) in generic arm memxor code. Replace those with an
actual conversion of the leftover byte store routines for big-endian
systems. This also provides a slight optimisation by removing the
additional instruction as well as increased symmetry between little- and
big-endian implementations.

Signed-off-by: Michael Weiser <[email protected]>
---
 arm/memxor.asm  | 12 ++++++------
 arm/memxor3.asm | 27 ++++++++++++++-------------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/arm/memxor.asm b/arm/memxor.asm
index 239a4034..b802e95c 100644
--- a/arm/memxor.asm
+++ b/arm/memxor.asm
@@ -138,24 +138,24 @@ PROLOGUE(nettle_memxor)
        adds    N, #8
        beq     .Lmemxor_odd_done
 
-       C We have TNC/8 left-over bytes in r4, high end
+       C We have TNC/8 left-over bytes in r4, (since working upwards) low
+       C end on LE and high end on BE
        S0ADJ   r4, CNT
        ldr     r3, [DST]
        eor     r3, r4
 
-       C memxor_leftover does an LSB store
-       C so we need to reverse if actually BE
-IF_BE(<        rev     r3, r3>)
-
        pop     {r4,r5,r6}
 
        C Store bytes, one by one.
 .Lmemxor_leftover:
+       C bring uppermost byte down for saving while preserving lower ones
+IF_BE(<        ror     r3, #24>)
        strb    r3, [DST], #+1
        subs    N, #1
        beq     .Lmemxor_done
        subs    TNC, #8
-       lsr     r3, #8
+       C bring down next byte, no need to preserve
+IF_LE(<        lsr     r3, #8>)
        bne     .Lmemxor_leftover
        b       .Lmemxor_bytes
 .Lmemxor_odd_done:
diff --git a/arm/memxor3.asm b/arm/memxor3.asm
index 69598e1c..76b8aae6 100644
--- a/arm/memxor3.asm
+++ b/arm/memxor3.asm
@@ -159,21 +159,21 @@ PROLOGUE(nettle_memxor3)
        adds    N, #8
        beq     .Lmemxor3_done
 
-       C Leftover bytes in r4, low end
+       C Leftover bytes in r4, (since working downwards) in high end on LE and
+       C low end on BE
        ldr     r5, [AP, #-4]
        eor     r4, r5, r4, S1ADJ ATNC
 
-       C leftover does an LSB store
-       C so we need to reverse if actually BE
-IF_BE(<        rev     r4, r4>)
-
 .Lmemxor3_au_leftover:
        C Store a byte at a time
-       ror     r4, #24
+       C bring uppermost byte down for saving while preserving lower ones
+IF_LE(<        ror     r4, #24>)
        strb    r4, [DST, #-1]!
        subs    N, #1
        beq     .Lmemxor3_done
        subs    ACNT, #8
+       C bring down next byte, no need to preserve
+IF_BE(<        lsr     r4, #8>)
        sub     AP, #1
        bne     .Lmemxor3_au_leftover
        b       .Lmemxor3_bytes
@@ -273,18 +273,19 @@ IF_BE(<   rev     r4, r4>)
        adds    N, #8
        beq     .Lmemxor3_done
 
-       C leftover does an LSB store
-       C so we need to reverse if actually BE
-IF_BE(<        rev     r4, r4>)
-
-       C Leftover bytes in a4, low end
-       ror     r4, ACNT
+       C Leftover bytes in r4, (since working downwards) in high end on LE and
+       C low end on BE after preparatory alignment correction
+IF_LE(<        ror     r4, ACNT>)
+IF_BE(<        ror     r4, ATNC>)
 .Lmemxor3_uu_leftover:
-       ror     r4, #24
+       C bring uppermost byte down for saving while preserving lower ones
+IF_LE(<        ror     r4, #24>)
        strb    r4, [DST, #-1]!
        subs    N, #1
        beq     .Lmemxor3_done
        subs    ACNT, #8
+       C bring down next byte, no need to preserve
+IF_BE(<        lsr     r4, #8>)
        bne     .Lmemxor3_uu_leftover
        b       .Lmemxor3_bytes
 
-- 
2.25.0

_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to