[ 
https://issues.apache.org/jira/browse/IGNITE-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093414#comment-18093414
 ] 

Ignite TC Bot commented on IGNITE-28836:
----------------------------------------

{panel:title=Branch: [pull/13296/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/13296/head] Base: [master] : No new tests 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=9169195&buildTypeId=IgniteTests24Java8_RunAll]
{color:#ffffff}tcbot-analysis-comment chainBuildId=9169195 
rerunBuildIds=none{color}

> DirectMessageWriter: reduce per-field overhead and per-message allocations on 
> the message serialization hot path
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-28836
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28836
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Anton Vinogradov
>            Assignee: Anton Vinogradov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Motivation*
> DirectMessageWriter is on the critical path of every outgoing message. Two 
> inefficiencies:
> # Each of the ~30 write methods re-resolves \{{state.item().stream}} (array 
> load + bounds check + field load) on every primitive write.
> # \{{writeCompressedMessage()}} allocates a fresh 10K 
> \{{ByteBuffer.allocateDirect()}} per compressed field (plus a doubling 
> re-allocation chain for payloads above 10K) and a brand-new 
> \{{DirectMessageWriter}} per field.
> *Fix*
> # Cache the current stream in \{{curStream}}, refreshed only when the current 
> state item changes (\{{setBuffer}} / \{{beforeNestedWrite}} / 
> \{{afterNestedWrite}}).
> # \{{CompressedMessage}} consumes the scratch buffer right in its constructor 
> (deflates into its own byte[]), so the buffer never escapes 
> \{{writeCompressedMessage()}}: the writer now keeps one reusable heap scratch 
> buffer (retained at the largest size seen) and a thread-confined reusable 
> \{{tmpWriter}}, and grows the scratch buffer without an intermediate byte[] 
> copy.
> No wire-format or public API changes; behavior-preserving, safe to backport.
> *Benchmark* (\{{JmhDirectMessageWriterBenchmark}}, added by the patch: an 
> exchange-style message with two compressed map fields sized below/above the 
> initial 10K scratch, and a ~35-field primitive message; JDK 17, -prof gc, 
> master vs patched)
> ||benchmark||master||patched||delta||
> |compressedMapFields(30), throughput|17 550 ops/s|20 037 ops/s|*+14%*|
> |compressedMapFields(30), allocations|26.7 KB/op heap + 20 KB/op direct|24.3 
> KB/op heap, zero direct|-9% heap, no Cleaner churn|
> |compressedMapFields(30), GC time|192 ms|9 ms|*x21 less*|
> |compressedMapFields(500), throughput|2 382 ops/s|2 475 ops/s|+4% (within 
> error)|
> |compressedMapFields(500), allocations|365 KB/op heap + ~300 KB/op direct|322 
> KB/op heap, zero direct|-12% heap, no Cleaner churn|
> |compressedMapFields(500), GC time|56 ms|11 ms|*x5 less*|
> |primitiveFields, throughput|34.9M ops/s|35.3M ops/s|within error|
> Master's direct-buffer churn is invisible to gc.alloc.rate.norm but shows up 
> as GC time: Cleaner processing makes collections an order of magnitude more 
> expensive at the same collection counts.
> *Testing*
> * DirectMarshallingMessagesTest — nested containers written through 16-byte 
> chunks (multi-pass resume across buffer swaps, exercises the curStream 
> refresh points).
> * CompressedMessageTest — >40K compressed payload (exercises the scratch 
> growth path), byte-for-byte writer-to-reader round-trip.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to