Anton Vinogradov created IGNITE-28847:
-----------------------------------------
Summary: REST (memcached): eliminate intermediate buffer copies
when encoding responses
Key: IGNITE-28847
URL: https://issues.apache.org/jira/browse/IGNITE-28847
Project: Ignite
Issue Type: Task
Reporter: Anton Vinogradov
Assignee: Anton Vinogradov
h3. Problem
{\{GridTcpRestParser#encodeMemcache}} copies each response key/value payload up
to 4 times before it reaches the wire:
# \{{encodeObj}} writes the encoded bytes into an intermediate
\{{ByteArrayOutputStream}} (copy into the BAOS internal buffer, with growth
reallocations on the way);
# \{{ByteArrayOutputStream#toByteArray()}} produces another full copy;
# that array is appended to a \{{GridByteArrayList}} created with capacity
\{{HDR_LEN}} (24 bytes) only, so the list keeps doubling and re-copying its
internal array while the payload is appended;
# \{{GridByteArrayList#entireArray()}} makes a final trimming copy because the
internal array is larger than the actual packet.
As a side effect, \{{encodeMemcache}} also mutates the message being encoded
(\{{msg.key(...)}} / \{{msg.value(...)}} are overwritten with the serialized
\{{byte[]}}).
h3. Change
* \{{encodeObj}} returns \{{T2<byte[], Integer>}} (encoded bytes + type flags)
instead of writing into a caller-provided \{{ByteArrayOutputStream}}. For all
fixed-width and \{{String}}/\{{byte[]}} payloads the encoded array is produced
directly; for JDK-serialized objects \{{U.marshal(marsh, obj)}} is used instead
of marshalling into a BAOS.
* \{{encodeMemcache}} computes the exact packet size up front and allocates
\{{GridByteArrayList(HDR_LEN + flagsLen + keyLen + dataLen)}}. The list never
grows, so \{{entireArray()}} returns its internal array without copying. The
only remaining copy is the single append of key/value bytes into the packet
buffer.
* The side-effecting mutation of \{{msg.key()}} / \{{msg.value()}} during
encoding is removed.
Net effect for \{{String}}/\{{byte[]}} payloads: 4 full payload copies → 1.
Wire format is *unchanged* (byte-for-byte identical packets, verified in
benchmark setup: the old and new encoders are replicated verbatim and their
outputs compared with \{{Arrays.equals}} for every payload; the benchmark
aborts on any mismatch).
h3. Benchmark
JMH 1.37, \{{Mode.AverageTime}}, 1 fork, 3×1s warmup, 5×1s measurement,
\{{-prof gc}}; Apple Silicon, JDK 17 (Amazon Corretto 17.0.11); key = short
\{{String}}, payloads: 64-byte \{{String}}, 1 KiB \{{String}}, 8 KiB
\{{byte[]}}, \{{HashMap}} of 10 entries (JDK serialization).
||Payload||old, ns/op||new, ns/op||Time||old, B/op||new, B/op||Alloc||
|STR_64|50.6 ± 0.8|21.3 ± 1.9|−58%|872|240|−72%|
|STR_1K|208.9 ± 20.2|73.8 ± 9.4|−65%|6,632|2,160|−67%|
|BYTES_8K|1,596.0 ± 110.4|433.0 ± 9.1|−73%|41,432|8,352|−80%|
|OBJ_MAP|962.3 ± 138.8|1,094.3 ± 194.1|parity ^1^|5,528|4,184|−24%|
^1^ JDK-serialized payloads are dominated by serialization cost itself; the
time difference is within the error bars, while per-op allocations still drop.
h3. Testing
{\{TcpRestParserSelfTest}}, \{{RestMemcacheProtocolSelfTest}},
\{{ClientMemcachedProtocolSelfTest}} — 38/38 pass.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)