[
https://issues.apache.org/jira/browse/IGNITE-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093414#comment-18093414
]
Ignite TC Bot commented on IGNITE-28836:
----------------------------------------
{panel:title=Branch: [pull/13296/head] Base: [master] : No blockers
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/13296/head] Base: [master] : No new tests
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *--> Run :: All*
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=9169195&buildTypeId=IgniteTests24Java8_RunAll]
{color:#ffffff}tcbot-analysis-comment chainBuildId=9169195
rerunBuildIds=none{color}
> DirectMessageWriter: reduce per-field overhead and per-message allocations on
> the message serialization hot path
> ----------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-28836
> URL: https://issues.apache.org/jira/browse/IGNITE-28836
> Project: Ignite
> Issue Type: Task
> Reporter: Anton Vinogradov
> Assignee: Anton Vinogradov
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *Motivation*
> DirectMessageWriter is on the critical path of every outgoing message. Two
> inefficiencies:
> # Each of the ~30 write methods re-resolves \{{state.item().stream}} (array
> load + bounds check + field load) on every primitive write.
> # \{{writeCompressedMessage()}} allocates a fresh 10K
> \{{ByteBuffer.allocateDirect()}} per compressed field (plus a doubling
> re-allocation chain for payloads above 10K) and a brand-new
> \{{DirectMessageWriter}} per field.
> *Fix*
> # Cache the current stream in \{{curStream}}, refreshed only when the current
> state item changes (\{{setBuffer}} / \{{beforeNestedWrite}} /
> \{{afterNestedWrite}}).
> # \{{CompressedMessage}} consumes the scratch buffer right in its constructor
> (deflates into its own byte[]), so the buffer never escapes
> \{{writeCompressedMessage()}}: the writer now keeps one reusable heap scratch
> buffer (retained at the largest size seen) and a thread-confined reusable
> \{{tmpWriter}}, and grows the scratch buffer without an intermediate byte[]
> copy.
> No wire-format or public API changes; behavior-preserving, safe to backport.
> *Benchmark* (\{{JmhDirectMessageWriterBenchmark}}, added by the patch: an
> exchange-style message with two compressed map fields sized below/above the
> initial 10K scratch, and a ~35-field primitive message; JDK 17, -prof gc,
> master vs patched)
> ||benchmark||master||patched||delta||
> |compressedMapFields(30), throughput|17 550 ops/s|20 037 ops/s|*+14%*|
> |compressedMapFields(30), allocations|26.7 KB/op heap + 20 KB/op direct|24.3
> KB/op heap, zero direct|-9% heap, no Cleaner churn|
> |compressedMapFields(30), GC time|192 ms|9 ms|*x21 less*|
> |compressedMapFields(500), throughput|2 382 ops/s|2 475 ops/s|+4% (within
> error)|
> |compressedMapFields(500), allocations|365 KB/op heap + ~300 KB/op direct|322
> KB/op heap, zero direct|-12% heap, no Cleaner churn|
> |compressedMapFields(500), GC time|56 ms|11 ms|*x5 less*|
> |primitiveFields, throughput|34.9M ops/s|35.3M ops/s|within error|
> Master's direct-buffer churn is invisible to gc.alloc.rate.norm but shows up
> as GC time: Cleaner processing makes collections an order of magnitude more
> expensive at the same collection counts.
> *Testing*
> * DirectMarshallingMessagesTest — nested containers written through 16-byte
> chunks (multi-pass resume across buffer swaps, exercises the curStream
> refresh points).
> * CompressedMessageTest — >40K compressed payload (exercises the scratch
> growth path), byte-for-byte writer-to-reader round-trip.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)