Hi all, On Sat, Feb 22, 2020 at 07:43:18PM +0100, Michael Weiser wrote:
> > 2. Eliminate use of rev in the armbe code. > ... I've been looking at the revs and they now strike me as taking the > easy way out anyway. They work around the implicit LE order in which Updated code is now at https://git.lysator.liu.se/michaelweiser/nettle/-/tree/arm-memxor-generic and inline below for comments. It now compiles and runs the testsuite fine on my native armv7veb when configurd with: CFLAGS="-march=armv6" LDFLAGS="-march=armv6" \ ../configure --disable-documentation \ --host=armv6b-unknown-linux-gnueabihf [...] Assembly files: arm/neon arm/v6 arm and: CFLAGS="-march=armv5te" LDFLAGS="-march=armv5te -Wl,--be8" \ ../configure --disable-documentation \ --host=armv5b-unknown-linux-gnueabihf [...] Assembly files: arm LDFLAGS "-Wl,--be8" is necessary for armv5teb to work on my system because it is BE8 which the gcc linker driver defaults to when run with -march=armv6 but not for armv5 which causes the resuling binaries to be BE32 and segfault or bus error in ld-linux.so.3 on startup. For a (likely wrong) explanation of BE8 vs. BE32 see https://lists.lysator.liu.se/pipermail/nettle-bugs/2018/007059.html. A quick check can be done with file: $ echo "int main(void) {}" > t.c $ gcc -march=armv5te -o t t.c $ ./t Segmentation fault $ file t t: ELF 32-bit MSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, not stripped $ gcc -march=armv5te -Wl,--be8 -o t t.c $ ./t $ file t t: ELF 32-bit MSB shared object, ARM, EABI5 BE8 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, not stripped The qemu environment is churning along in compilation currently. Previous assembler error for reference: $ make -j4 [...] /usr/bin/m4 ../asm.m4 machine.m4 config.m4 memxor.asm >memxor.s /usr/bin/m4 ../asm.m4 machine.m4 config.m4 memxor3.asm >memxor3.s gcc -I. -DHAVE_CONFIG_H -march=armv5te -ggdb3 -Wall -W -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wpointer-arith -Wbad-function-cast -Wnested-externs -fpic -MT memxor.o -MD -MP -MF memxor.o.d -c memxor.s gcc -I. -DHAVE_CONFIG_H -march=armv5te -ggdb3 -Wall -W -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wpointer-arith -Wbad-function-cast -Wnested-externs -fpic -MT memxor3.o -MD -MP -MF memxor3.o.d -c memxor3.s memxor.s: memxor3.s: Assembler messages: memxor3.s:146: Error: selected processor does not support `rev r4,r4' in ARM mode Assembler messages: memxor3.s:256: Error: selected processor does not support `rev r4,r4' in ARM mode memxor.s:126: Error: selected processor does not support `rev r3,r3' in ARM mode -- Thanks, Michael From 3e2118d41472842c368bb5bb56d71023b861b59d Mon Sep 17 00:00:00 2001 From: Michael Weiser <[email protected]> Date: Sun, 23 Feb 2020 15:22:51 +0100 Subject: [PATCH] arm: Fix memxor for non-armv6+ big-endian systems ARM assembly adjustments for big-endian systems contained armv6+-only instructions (rev) in generic arm memxor code. Replace those with an actual conversion of the leftover byte store routines for big-endian systems. This also provides a slight optimisation by removing the additional instruction as well as increased symmetry between little- and big-endian implementations. Signed-off-by: Michael Weiser <[email protected]> --- arm/memxor.asm | 12 ++++++------ arm/memxor3.asm | 27 ++++++++++++++------------- 2 files changed, 20 insertions(+), 19 deletions(-) diff --git a/arm/memxor.asm b/arm/memxor.asm index 239a4034..b802e95c 100644 --- a/arm/memxor.asm +++ b/arm/memxor.asm @@ -138,24 +138,24 @@ PROLOGUE(nettle_memxor) adds N, #8 beq .Lmemxor_odd_done - C We have TNC/8 left-over bytes in r4, high end + C We have TNC/8 left-over bytes in r4, (since working upwards) low + C end on LE and high end on BE S0ADJ r4, CNT ldr r3, [DST] eor r3, r4 - C memxor_leftover does an LSB store - C so we need to reverse if actually BE -IF_BE(< rev r3, r3>) - pop {r4,r5,r6} C Store bytes, one by one. .Lmemxor_leftover: + C bring uppermost byte down for saving while preserving lower ones +IF_BE(< ror r3, #24>) strb r3, [DST], #+1 subs N, #1 beq .Lmemxor_done subs TNC, #8 - lsr r3, #8 + C bring down next byte, no need to preserve +IF_LE(< lsr r3, #8>) bne .Lmemxor_leftover b .Lmemxor_bytes .Lmemxor_odd_done: diff --git a/arm/memxor3.asm b/arm/memxor3.asm index 69598e1c..76b8aae6 100644 --- a/arm/memxor3.asm +++ b/arm/memxor3.asm @@ -159,21 +159,21 @@ PROLOGUE(nettle_memxor3) adds N, #8 beq .Lmemxor3_done - C Leftover bytes in r4, low end + C Leftover bytes in r4, (since working downwards) in high end on LE and + C low end on BE ldr r5, [AP, #-4] eor r4, r5, r4, S1ADJ ATNC - C leftover does an LSB store - C so we need to reverse if actually BE -IF_BE(< rev r4, r4>) - .Lmemxor3_au_leftover: C Store a byte at a time - ror r4, #24 + C bring uppermost byte down for saving while preserving lower ones +IF_LE(< ror r4, #24>) strb r4, [DST, #-1]! subs N, #1 beq .Lmemxor3_done subs ACNT, #8 + C bring down next byte, no need to preserve +IF_BE(< lsr r4, #8>) sub AP, #1 bne .Lmemxor3_au_leftover b .Lmemxor3_bytes @@ -273,18 +273,19 @@ IF_BE(< rev r4, r4>) adds N, #8 beq .Lmemxor3_done - C leftover does an LSB store - C so we need to reverse if actually BE -IF_BE(< rev r4, r4>) - - C Leftover bytes in a4, low end - ror r4, ACNT + C Leftover bytes in r4, (since working downwards) in high end on LE and + C low end on BE after preparatory alignment correction +IF_LE(< ror r4, ACNT>) +IF_BE(< ror r4, ATNC>) .Lmemxor3_uu_leftover: - ror r4, #24 + C bring uppermost byte down for saving while preserving lower ones +IF_LE(< ror r4, #24>) strb r4, [DST, #-1]! subs N, #1 beq .Lmemxor3_done subs ACNT, #8 + C bring down next byte, no need to preserve +IF_BE(< lsr r4, #8>) bne .Lmemxor3_uu_leftover b .Lmemxor3_bytes -- 2.25.0 _______________________________________________ nettle-bugs mailing list [email protected] http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
