smengcl opened a new pull request, #10353:
URL: https://github.com/apache/ozone/pull/10353

   Generated-by: Claude Code (Opus 4.7)
   
   ## What changes were proposed in this pull request?
   
   `BufferUtils.assignByteBuffers` previously allocated heap `ByteBuffer`s for 
chunk reads. Reading from a `FileChannel` into a heap buffer forces the JDK to 
first read into a thread-local `DirectByteBuffer` (in `sun.nio.ch.IOUtil`) and 
then memcpy the chunk-sized payload into the user's heap buffer. So every 
default-config chunk read paid one extra user-space chunk-sized memcpy plus one 
chunk-sized heap allocation.
   
   Two prior PRs added opt-in direct-buffer paths to skip exactly this copy:
   
   * HDDS-7188 (Tsz-Wo Nicholas Sze, Jan 2025): 
`ChunkUtils.readDataNettyChunkedNioFile` via Netty 
`PooledByteBufAllocator.DEFAULT`
   * HDDS-10488 (Sammi Chen, Sep 2024): `MappedBufferManager` mmap reads with 
semaphore-tracked quota
   
   Both deliberately avoided raw `ByteBuffer.allocateDirect` because it is a 
known leak pattern under sustained load. Tiny wrapper objects mean GC fires 
rarely, the `Cleaner` that frees direct memory never runs promptly, and the 
native budget exhausts with `OutOfMemoryError: Direct buffer memory`. Neither 
PR updated the default `assignByteBuffers` path, so the heap allocation stayed.
   
   This change uses Hadoop's `org.apache.hadoop.util.DirectBufferPool`, the 
same pool already used by `BlockOutputStream` and `KeyValueContainerCheck`. 
Release plumbing lives entirely in `ChunkUtils.readData`: a release lambda is 
registered on `DispatcherContext.setReleaseMethod`, fires once in 
`GrpcXceiverService`'s `finally` block after `responseObserver.onNext` returns 
(the standard server marshaller is synchronous, so the response bytes are 
already serialized into the Netty outgoing buffer by then), and returns each 
buffer to the pool. Paths without a release-supporting context fall back to GC 
reclaim via the pool's `WeakReference` layer, so the pool degrades to plain 
`allocateDirect` rather than leaking. `ChunkBuffer` and 
`ChunkBufferImplWithByteBufferList` are unchanged from master.
   
   A consumer audit confirmed no `.array()` usage breaks with direct buffers. 
`ChecksumByteBufferImpl` guards `.array()` with `hasArray()`, and 
`UnsafeByteOperations.unsafeWrap` of a direct `ByteBuffer` produces a zero-copy 
`NioByteString` without calling `.array()`. One test 
(`TestChunkUtils.concurrentReadWriteOfSameFile`) called `.array()` directly and 
was updated to use `get(byte[])`.
   
   ### Expected impact
   
   Per chunk read:
   
   * One chunk-sized user-space memcpy eliminated (~270 µs for a 4 MB chunk on 
a typical x86 server with ~15 GB/s memory bandwidth).
   * One chunk-sized heap allocation eliminated (heap churn moves to bounded 
direct memory in the pool).
   
   Wall-clock effect depends on storage:
   
   | Storage | Latency per chunk | Effect |
   |---|---|---|
   | Page-cache hit | sub-ms total | meaningful (memcpy was a large fraction) |
   | NVMe sequential | ~1 ms total | ~20 percent reduction |
   | SATA SSD | ~10 ms total | a few percent |
   | HDD | tens of ms total | invisible at wall-clock |
   
   Heap-churn reduction is consistent across storage tiers: at 100 MB/s read 
throughput a DN sheds roughly 100 MB/s of heap allocation.
   
   ### Pool sizing and operational note
   
   `DirectBufferPool` is unbounded by design (per `ConcurrentMap<Integer, 
Queue<WeakReference<ByteBuffer>>>`). Steady-state pool memory converges to peak 
concurrent in-flight buffers per capacity class. For default-config chunk reads 
(4 MB) on a 64-thread DN, that is roughly 256 MB of direct memory. Operators 
should ensure `-XX:MaxDirectMemorySize` accommodates this; the existing 
`BlockOutputStream` and `KeyValueContainerCheck` consumers of 
`DirectBufferPool` are already sized this way.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-15359
   
   ## How was this patch tested?
   
   - Existing tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to