Re: [PR] Performance optimizations [mina-sshd]

via GitHub Mon, 22 Jul 2024 06:01:04 -0700


tomaswolf commented on PR #530:
URL: https://github.com/apache/mina-sshd/pull/530#issuecomment-2242903280


   > In chacha20-poly1305, would it make sense to use a different mechanism for 
`unpackIntLE` and `packIntLE`, such as using `Unsafe` or `VarHandle` ?
   
   `VarHandle` doesn't exist in Java 8. But 
`ByteBuffer.wrap(someByteArray).order(ByteOrder.LITTLE_ENDIAN).asIntBuffer()` 
does.
   
   However, using `IntBuffer.put()` instead of `packIntLE()`, and processing 
`int`s via `IntBuffer`s, makes matters
   _worse_ on Java 8 (a slow-down of 20-30% compared to the version in this PR 
at commit e152cc3). Benchmarked
   with an older JDK 8 (1.8.0_201) and with a brand-new 1.8.0_422.
   
   On Java 11, benchmarking shows a speed improvement of 5%, on Java 17 of 9%. 
(Again, with commit e152cc3 as baseline.)
   
   The problem is that on Java 8 the `ByteBufferAsIntBufferL` implementation 
ends up doing exactly the same as
   our packLE/unpackLE routines. So trying to process ints for whole blocks is 
going to do lots of conversions
   to assemble bytes into ints, or split an int into its four bytes. So it's 
definitely not worth it on Java 8.
   
   On Java 11 or newer, `ByteBufferAsIntBufferL` is much more efficient and can 
write or read whole `int`s.
   Only then does such a change make sense.
   
   But we could make the choice at run-time, based on the Java version we're 
running on, using `IntBuffer`s only
   for Java >= 11. Or we could try a multi-release JAR for sshd-common (and 
sshd-osgi).
   
   Either way, I'd prefer not to attempt that in this PR. If we do something 
like this, let's do it in some
   later commit.
   
   > ...multithreaded implementation of chacha20-poly1305...
   
   The idea is that one can pre-compute the ChaCha20 key stream asynchronously. 
This may indeed give some
   improvements for file transfers. An implementation might be not quite 
trivial, though, and it may mean
   that we'd need at least one extra thread for every connection that uses 
ChaCha20-Poly1305. Which might
   be a problem for a server, or also for a client doing many connections in 
parallel. Maybe it would need
   another fixed-size thread pool, and fall back to compute the key stream 
synchronously if not precomputed
   yet when it's needed. In any case, not trivial to get right. The Poly1305 
mac computation cannot be offloaded
   in that way.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Performance optimizations [mina-sshd]

Reply via email to