tomaswolf commented on PR #530: URL: https://github.com/apache/mina-sshd/pull/530#issuecomment-2242903280
> In chacha20-poly1305, would it make sense to use a different mechanism for `unpackIntLE` and `packIntLE`, such as using `Unsafe` or `VarHandle` ? `VarHandle` doesn't exist in Java 8. But `ByteBuffer.wrap(someByteArray).order(ByteOrder.LITTLE_ENDIAN).asIntBuffer()` does. However, using `IntBuffer.put()` instead of `packIntLE()`, and processing `int`s via `IntBuffer`s, makes matters _worse_ on Java 8 (a slow-down of 20-30% compared to the version in this PR at commit e152cc3). Benchmarked with an older JDK 8 (1.8.0_201) and with a brand-new 1.8.0_422. On Java 11, benchmarking shows a speed improvement of 5%, on Java 17 of 9%. (Again, with commit e152cc3 as baseline.) The problem is that on Java 8 the `ByteBufferAsIntBufferL` implementation ends up doing exactly the same as our packLE/unpackLE routines. So trying to process ints for whole blocks is going to do lots of conversions to assemble bytes into ints, or split an int into its four bytes. So it's definitely not worth it on Java 8. On Java 11 or newer, `ByteBufferAsIntBufferL` is much more efficient and can write or read whole `int`s. Only then does such a change make sense. But we could make the choice at run-time, based on the Java version we're running on, using `IntBuffer`s only for Java >= 11. Or we could try a multi-release JAR for sshd-common (and sshd-osgi). Either way, I'd prefer not to attempt that in this PR. If we do something like this, let's do it in some later commit. > ...multithreaded implementation of chacha20-poly1305... The idea is that one can pre-compute the ChaCha20 key stream asynchronously. This may indeed give some improvements for file transfers. An implementation might be not quite trivial, though, and it may mean that we'd need at least one extra thread for every connection that uses ChaCha20-Poly1305. Which might be a problem for a server, or also for a client doing many connections in parallel. Maybe it would need another fixed-size thread pool, and fall back to compute the key stream synchronously if not precomputed yet when it's needed. In any case, not trivial to get right. The Poly1305 mac computation cannot be offloaded in that way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org