He-Pin opened a new pull request, #3019:
URL: https://github.com/apache/pekko/pull/3019

   ### Motivation
   
   The JDK `SSLEngine` works on `byte[]` internally and makes an extra 
direct<->heap copy on every `wrap`/`unwrap` when handed a direct buffer (see 
Netty's `SslHandler` `SslEngineType.JDK`, which deliberately uses heap buffers 
for the JDK engine for exactly this reason). `TlsGraphStage` took its three 
transport buffers from the TCP direct `BufferPool`, so every record paid that 
copy and each connection pinned ~384 KiB of direct memory for buffers that 
normally only hold a single ~16 KiB TLS record.
   
   ### Modification
   
   Allocate the transport buffers on the heap and size them on demand like 
Netty: start at one packet and let `growTransportOutBuffer` enlarge the wrap 
buffer (capped at `MaxApplicationRecordsPerEngineCall` packets) only when a 
larger batch is actually produced. Remove the now-unused `BufferPool`/`Tcp` 
wiring and the pool-derived `applicationBufferSize` helper.
   
   Only the opt-in GraphStage TLS engine is touched; the default legacy actor 
engine (`TLSActor`) is unchanged.
   
   ### Result
   
   Measured with the existing `TlsBenchmark` (`warmRoundTrip`, `-prof gc`, 
`implementation=graphstage`). Throughput on this machine was too noisy to 
compare (±25-100%), so per-op heap allocation (`gc.alloc.rate.norm`, error <1%) 
is the reliable signal:
   
   | payload | before (direct/pool) | after (on-demand heap) | Δ |
   | ---: | ---: | ---: | ---: |
   | 256 B | 1657 B/op | 1672 B/op | +0.9% |
   | 4 KiB | 20559 B/op | 18980 B/op | −7.7% |
   | 64 KiB | 301864 B/op | 286199 B/op | −5.2% |
   
   Direct memory per connection drops from ~384 KiB to zero; heap stays small 
for small / short-lived connections (on-demand sizing) and only grows for large 
batched writes. Behaviour is unchanged.
   
   ### Tests
   
   - `stream-tests/testOnly *TlsGraphStageIsolatedSpec 
*TlsGraphStageEdgeCasesSpec *TlsGraphStageSpec *TlsSpec 
*TlsEngineSelectionSpec` → 246 succeeded, 0 failed
   - Added a directional test — `grow the on-demand transport buffers for 
multi-record payloads without losing bytes` — driving a 200 KiB payload through 
the new grow path and asserting a byte-exact round trip.
   - `bench-jmh` `TlsBenchmark.warmRoundTrip -prof gc` (numbers above).
   
   ### References
   
   Refs #2878
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to