I'm trying to learn a bit of ppc assembly. Below is an implementation of
_chacha_core. Seems to work, when tested on gcc112.fsffrance.org (just
put the file in the powerpc64 directory and reconfigure). This machine
is little-endian, I haven't yet tested on big-endian.
Unfortunately I don't get any accurate benchmark numbers on that
machine, but I think speedup may be on the order of 50%. It could likely
be speedup further by processing 2, 3 or 4 blocks in parallel, similar to
recent improvements for arm and x86_64. I'd like to do that after the
simpler single-block function is properly merged.
I'm not sure where it fits under powerpc64. The code doesn't need any
cryptographic extensions, but it depends on vector instructions as well
as VSX registers (for the unaligned load and store instructions). So I'd
need advice both on the directory hierarchy and compile time
configuration, and appropriate runtime tests for fat builds.
Comments on the code highly appreciated! It's the first ppc code I've
written, and the reference manual isn't that easy to navigate. The
vector instructions seem very nice to work with, and makes for a shorter
QROUND than both x86_64 SSE and ARM Neon (these suffer a bit from
missing vector rotate instruction).
Help with additional benchmarking would also be useful.
Regards,
/Niels
C powerpc64/chacha-core-internal.asm
ifelse(`
Copyright (C) 2020 Niels Möller and Torbjörn Granlund
This file is part of GNU Nettle.
GNU Nettle is free software: you can redistribute it and/or
modify it under the terms of either:
* the GNU Lesser General Public License as published by the Free
Software Foundation; either version 3 of the License, or (at your
option) any later version.
or
* the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your
option) any later version.
or both in parallel, as here.
GNU Nettle is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received copies of the GNU General Public License and
the GNU Lesser General Public License along with this program. If
not, see http://www.gnu.org/licenses/.
')
C Register usage:
C Argments
define(`DST', `r3')
define(`SRC', `r4')
define(`ROUNDS', `r5')
C Working state
define(`X0', `v0')
define(`X1', `v1')
define(`X2', `v2')
define(`X3', `v3')
define(`ROT16', `v4')
define(`ROT12', `v5')
define(`ROT8', `v6')
define(`ROT7', `v7')
C Original input state
define(`S0', `v8')
define(`S1', `v9')
define(`S2', `v10')
define(`S3', `v11')
C QROUND(X0, X1, X2, X3)
define(`QROUND', `
C x0 += x1, x3 ^= x0, x3 lrot 16
C x2 += x3, x1 ^= x2, x1 lrot 12
C x0 += x1, x3 ^= x0, x3 lrot 8
C x2 += x3, x1 ^= x2, x1 lrot 7
vadduwm $1, $1, $2
vxor $4, $4, $1
vrlw $4, $4, ROT16
vadduwm $3, $3, $4
vxor $2, $2, $3
vrlw $2, $2, ROT12
vadduwm $1, $1, $2
vxor $4, $4, $1
vrlw $4, $4, ROT8
vadduwm $3, $3, $4
vxor $2, $2, $3
vrlw $2, $2, ROT7
')
.text
.align 4
C _chacha_core(uint32_t *dst, const uint32_t *src, unsigned rounds)
PROLOGUE(_nettle_chacha_core)
li r6, 0x10 C set up some...
li r7, 0x20 C ...useful...
li r8, 0x30 C ...offsets
vspltisw ROT16, -16 C -16 instead of 16 actually works!
vspltisw ROT12, 12
vspltisw ROT8, 8
vspltisw ROT7, 7
lxvw4x VSR(X0), 0, SRC
lxvw4x VSR(X1), r6, SRC
lxvw4x VSR(X2), r7, SRC
lxvw4x VSR(X3), r8, SRC
vor S0, X0, X0
vor S1, X1, X1
vor S2, X2, X2
vor S3, X3, X3
srdi ROUNDS, ROUNDS, 1
mtctr ROUNDS
.Loop:
QROUND(X0, X1, X2, X3)
C Rotate rows, to get
C 0 1 2 3
C 5 6 7 4 <<< 1
C 10 11 8 9 <<< 2
C 15 12 13 14 <<< 3
vsldoi X1, X1, X1, 4
vsldoi X2, X2, X2, 8
vsldoi X3, X3, X3, 12
QROUND(X0, X1, X2, X3)
C Inverse rotation
vsldoi X1, X1, X1, 12
vsldoi X2, X2, X2, 8
vsldoi X3, X3, X3, 4
bdnz .Loop
vadduwm X0, X0, S0
vadduwm X1, X1, S1
vadduwm X2, X2, S2
vadduwm X3, X3, S3
stxvw4x VSR(X0), 0, DST
stxvw4x VSR(X1), r6, DST
stxvw4x VSR(X2), r7, DST
stxvw4x VSR(X3), r8, DST
blr
EPILOGUE(_nettle_chacha_core)
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs