clintropolis opened a new pull request, #19567:
URL: https://github.com/apache/druid/pull/19567

   ### Description
   Third time's a charm? Previous attempts #12032 and #14479 afaict both got 
stuck trying to resolve failures in the integration tests (the old docker based 
integration tests). I started fresh for this attempt, but the embedded-tests 
that have replaced our old frameworks made it a lot easier to dig in and 
determine the problem (and credit to prior attempts which were used as 
reference for comparison)
   
   The root problem with prior attempts that resulted in the observed hanging 
is that Netty 4's `setAutoRead(true)` is just a config flag. Netty 3's 
`setReadable(true)` also kicked an immediate read. For Druid's chunked 
responses where the server flushes the status line in one TCP write and the 
body in subsequent writes (e.g. `QueryResource`), the body bytes sit on the 
wire for ~5 minutes until connection close. Fix is a one-line `ctx.read()` 
after `handleResponse` in `NettyHttpClient` with a comment explaining the 
semantic difference.
   
   #### Allocator and direct memory
   The HTTP client previously used Netty 3's `HeapChannelBufferFactory` which 
is unpooled heap buffers. Netty 4.2's default is `AdaptiveByteBufAllocator` 
(pooled internally via `AdaptivePoolingAllocator`, direct by default). This is 
a substantive change in memory characteristics:
   
   - **Direct memory**: the HTTP client now consumes some direct memory. 
Operators
     running close to their `-XX:MaxDirectMemorySize` budget may need to bump 
it.
   - **Pooled vs unpooled**: buffers are now pooled. Resident memory may show
     higher than under Netty 3 even when idle, normal pooled-allocator behavior.
   
   This PR is using Netty 4.2's default allocator (`adaptive`) rather than 
pinning to the `PooledByteBufAllocator` that was the Netty 4 default until 4.2 
(or the unpooled heap behavior of Netty 3). Since Druid operators are coming 
from heap-unpooled (Netty 3), neither Netty 4 choice represents a "known good 
baseline"; both are new characteristics for this workload. However, using 
pooled off-heap buffers of some kind should offer a performance improvement so 
it seems worth the risk.
   
   The allocator can be swapped without a code change:
   
    | `-Dio.netty.allocator.type=…` | Effect |
    |---|---|
    | `adaptive` (default in Netty 4.2) | What this PR ships with |
    | `pooled` | The classic `PooledByteBufAllocator` |
    | `unpooled` | Closest to Netty 3 behavior; combine with 
`-Dio.netty.noPreferDirect=true` to also stay on-heap |
   
   Other useful knobs: 
   
     | Property | Default | Purpose |
     |---|---|---|
     | `io.netty.noPreferDirect` | `false` | `true` ⇒ pooled but heap rather 
than direct |
     | `io.netty.allocator.maxOrder` | 9 | Smaller value ⇒ smaller chunk size, 
lower memory floor |
     | `io.netty.maxDirectMemory` | derived from `-XX:MaxDirectMemorySize` | 
Netty's direct-memory cap |
     | `io.netty.leakDetection.level` | `simple` | `disabled` / `simple` / 
`advanced` / `paranoid` |
   
   #### Other stuff
   `InputStreamHolder.fromChannelBuffer(ByteBuf, long)` now copies bytes 
synchronously instead of wrapping a `ByteBufInputStream` over a pooled buffer 
that `SimpleChannelInboundHandler` releases after `channelRead0` returns. The 
prior PRs hit CodeQL leak warnings on 
`SequenceInputStreamResponseHandler.java`; that handler is now leak-safe too, 
and so are `DirectDruidClient` and `DataServerResponseHandler` which all funnel 
through the same helper.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to