I would like to help but I have no clue or experience with ARM NEON, sorry.
regards, Mamone On Tue, Jul 7, 2020 at 5:46 PM Niels Möller <ni...@lysator.liu.se> wrote: > I've written some new ARM Neon assembly for salsa20. See > > https://gitlab.com/gnutls/nettle/-/commit/2ac58a1ce729a6cfe1d3703f4deb6da8862909e9 > , > when configured with --enable-arm-neon. > > It interleaves the processing of two blocks, which gives a speedup of > 50% -- 100% on the ARM cores where I've tested it. Before merging, I > need to fix fat builds to use the new code on processors that support > it. > > To make it work also on big-endian ARM, I'd need some help. (I think the > qemu-user package supports big-endian ARM, at least, it includes a > program named qemu-armeb. But I'm missing a cross compiler and cross > debugger). > > I'd like to do the same for x86_64. And for chacha, it might give even > greater speedup to interleave processing of three blocks, which may be > possible since I think chacha needs fewer registers for temporaries. > > For both x86_64 and ARM neon, the current code uses 128-bit wide > registers. Processors with 256-bit wide simd registers (at least 16 of > them) could do twice as many blocks at a time. > > Regards, > /Niels > > -- > Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. > Internet email is subject to wholesale government surveillance. > > _______________________________________________ > nettle-bugs mailing list > nettle-bugs@lists.lysator.liu.se > http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs > _______________________________________________ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs