clintropolis opened a new pull request, #19567:
URL: https://github.com/apache/druid/pull/19567
### Description
Third time's a charm? Previous attempts #12032 and #14479 afaict both got
stuck trying to resolve failures in the integration tests (the old docker based
integration tests). I started fresh for this attempt, but the embedded-tests
that have replaced our old frameworks made it a lot easier to dig in and
determine the problem (and credit to prior attempts which were used as
reference for comparison)
The root problem with prior attempts that resulted in the observed hanging
is that Netty 4's `setAutoRead(true)` is just a config flag. Netty 3's
`setReadable(true)` also kicked an immediate read. For Druid's chunked
responses where the server flushes the status line in one TCP write and the
body in subsequent writes (e.g. `QueryResource`), the body bytes sit on the
wire for ~5 minutes until connection close. Fix is a one-line `ctx.read()`
after `handleResponse` in `NettyHttpClient` with a comment explaining the
semantic difference.
#### Allocator and direct memory
The HTTP client previously used Netty 3's `HeapChannelBufferFactory` which
is unpooled heap buffers. Netty 4.2's default is `AdaptiveByteBufAllocator`
(pooled internally via `AdaptivePoolingAllocator`, direct by default). This is
a substantive change in memory characteristics:
- **Direct memory**: the HTTP client now consumes some direct memory.
Operators
running close to their `-XX:MaxDirectMemorySize` budget may need to bump
it.
- **Pooled vs unpooled**: buffers are now pooled. Resident memory may show
higher than under Netty 3 even when idle, normal pooled-allocator behavior.
This PR is using Netty 4.2's default allocator (`adaptive`) rather than
pinning to the `PooledByteBufAllocator` that was the Netty 4 default until 4.2
(or the unpooled heap behavior of Netty 3). Since Druid operators are coming
from heap-unpooled (Netty 3), neither Netty 4 choice represents a "known good
baseline"; both are new characteristics for this workload. However, using
pooled off-heap buffers of some kind should offer a performance improvement so
it seems worth the risk.
The allocator can be swapped without a code change:
| `-Dio.netty.allocator.type=…` | Effect |
|---|---|
| `adaptive` (default in Netty 4.2) | What this PR ships with |
| `pooled` | The classic `PooledByteBufAllocator` |
| `unpooled` | Closest to Netty 3 behavior; combine with
`-Dio.netty.noPreferDirect=true` to also stay on-heap |
Other useful knobs:
| Property | Default | Purpose |
|---|---|---|
| `io.netty.noPreferDirect` | `false` | `true` ⇒ pooled but heap rather
than direct |
| `io.netty.allocator.maxOrder` | 9 | Smaller value ⇒ smaller chunk size,
lower memory floor |
| `io.netty.maxDirectMemory` | derived from `-XX:MaxDirectMemorySize` |
Netty's direct-memory cap |
| `io.netty.leakDetection.level` | `simple` | `disabled` / `simple` /
`advanced` / `paranoid` |
#### Other stuff
`InputStreamHolder.fromChannelBuffer(ByteBuf, long)` now copies bytes
synchronously instead of wrapping a `ByteBufInputStream` over a pooled buffer
that `SimpleChannelInboundHandler` releases after `channelRead0` returns. The
prior PRs hit CodeQL leak warnings on
`SequenceInputStreamResponseHandler.java`; that handler is now leak-safe too,
and so are `DirectDruidClient` and `DataServerResponseHandler` which all funnel
through the same helper.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]