[
https://issues.apache.org/jira/browse/IGNITE-28853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Werner updated IGNITE-28853:
-----------------------------------
Epic Link: IGNITE-25881
Ignite Flags: (was: Docs Required,Release Notes Required)
> CompressedMessage: excessive copying and per-message direct buffer
> allocations on both send and receive paths
> -------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-28853
> URL: https://issues.apache.org/jira/browse/IGNITE-28853
> Project: Ignite
> Issue Type: Task
> Reporter: Anton Vinogradov
> Assignee: Dmitry Werner
> Priority: Major
> Labels: IEP-132, ise
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> CompressedMessage moves the same bytes through memory 3-5 times and allocates
> direct ByteBuffers per message on both sides of the wire. Direct allocation
> is expensive (memory zeroing, Cleaner-based release, potential System.gc()
> inside Bits.reserveMemory), while no point of the path actually needs a
> direct buffer: data arrives in and leaves through heap arrays.
> Send path: compress() copies the whole source buffer into a byte[], deflates
> via DeflaterOutputStream (512-byte internal buffer -> many small JNI calls)
> into a ByteArrayOutputStream pre-sized to the *uncompressed* length, then
> copies again via toByteArray(); ChunkedByteReader then copies every 10K chunk
> into a fresh array one more time.
> Receive path: CompressedMessageSerializer.readFrom() accumulates incoming
> chunks into a 100KB direct ByteBuffer allocated per message (grown by
> doubling through another copy), although each chunk is already a fresh heap
> array returned by readByteArray(); uncompress() copies it all back into a
> heap array and inflates via InflaterInputStream.readAllBytes() (internal 8K
> buffers + final consolidation copy) despite the exact result size being known
> upfront; DirectMessageReader.readCompressedMessageAndDeserialize() then
> copies the whole uncompressed payload into yet another per-message direct
> buffer, although DirectByteBufferStream fully supports heap buffers.
> Fix (wire format unchanged):
> * Internal representation switched to List<byte[]> chunks for both
> directions, ChunkedByteReader removed.
> * compress(): raw Deflater with setInput(ByteBuffer) (no input copy),
> deflating straight into wire-ready chunks - compressed bytes are written
> exactly once.
> * readFrom(): a received chunk is simply added to the list - zero copies,
> zero direct allocations.
> * uncompress(): raw Inflater fed chunk by chunk into an exact-size
> byte[dataSize].
> * readCompressedMessageAndDeserialize(): ByteBuffer.wrap(uncompressed)
> instead of allocateDirect+put+flip.
> JMH (GridDhtPartitionsFullMessage receive round-trip with two @Compress map
> fields, JDK 17, M-series):
> * 30 entries: 15.2K +/- 34.8K -> 100.9K +/- 6.2K ops/s (~6.6x; master's huge
> variance is caused by per-message direct allocations triggering GC storms),
> heap 66.7K -> 25.2K B/op (-62%).
> * 500 entries: 4.38K -> 5.76K ops/s (+31%), heap 522K -> 431K B/op (-18%).
> * On top of the heap savings, all per-message direct buffer allocations
> (~365KB/op at 500 entries, invisible to gc.alloc.rate) are eliminated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)