arturobernalg commented on PR #578:
URL: 
https://github.com/apache/httpcomponents-core/pull/578#issuecomment-4106364938

     Hi @ok2c ,
   
     I've spent considerable time benchmarking the classic-over-async facade 
trying to reach the 5% improvement threshold. Here's a summary of what I found.
   
     Benchmark setup: 50,000 requests, concurrency 200, 1MB response bodies, 
HTTP/1.1, 6 rounds per configuration, A/B comparison in the same session where 
possible.
   
     What I tried:
   
     1. `PooledByteBufferAllocator` wired through `ClassicToAsyncAdaptor` → 
`SharedInputBuffer`: Result within noise (±2%).   during buffer expansion 
(2KB→4KB→...→2MB), only intermediate
     buffers are recycled. The final ~2MB buffer cannot be safely returned to 
the pool because `releaseResources()` races with async framework callbacks 
(use-after-free). So every request still allocates a fresh final
      buffer.
     2. Content-Length pre-sizing (read Content-Length header and pre-allocate 
`SharedInputBuffer` to the right size): Actually 16% slower. With 200 
concurrent connections, pre-allocating 200×1MB buffers up front
     overwhelms the system and disrupts the flow control dynamics (capacity 
channel reports 1MB available instead of 2KB).
     3. `SharedInputBuffer` micro-optimizations (signalAll() → signal(), 
AtomicInteger → plain int for capacity increment since it's always accessed 
under the lock): Correct improvements semantically, but not
     measurable — within noise.
   
     Conclusion: At concurrency 200 with 1MB responses, the system is 
transferring ~2.5GB/sec of content. The bottleneck is memory bandwidth and 
network I/O, not buffer allocation or lock contention in
     `SharedInputBuffer`. The buffer management overhead is a rounding error 
compared to the cost of moving data through the network stack.
   
     The `PooledByteBufferAllocator` itself performs well in isolation (JMH: 
471 ops/ms vs 194 for `SimpleByteBufferAllocator` at 1KB HEAP), but the 
classic-over-async facade's one-way buffer growth pattern prevents the pool 
from being effective. The allocator would likely show better results in code 
paths where buffers are allocated and released at the same size repeatedly, 
like async framework internals or H2 frame
     handling.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to