Anton Vinogradov created IGNITE-28847:
-----------------------------------------

             Summary: REST (memcached): eliminate intermediate buffer copies 
when encoding responses
                 Key: IGNITE-28847
                 URL: https://issues.apache.org/jira/browse/IGNITE-28847
             Project: Ignite
          Issue Type: Task
            Reporter: Anton Vinogradov
            Assignee: Anton Vinogradov


h3. Problem

{\{GridTcpRestParser#encodeMemcache}} copies each response key/value payload up 
to 4 times before it reaches the wire:
# \{{encodeObj}} writes the encoded bytes into an intermediate 
\{{ByteArrayOutputStream}} (copy into the BAOS internal buffer, with growth 
reallocations on the way);
# \{{ByteArrayOutputStream#toByteArray()}} produces another full copy;
# that array is appended to a \{{GridByteArrayList}} created with capacity 
\{{HDR_LEN}} (24 bytes) only, so the list keeps doubling and re-copying its 
internal array while the payload is appended;
# \{{GridByteArrayList#entireArray()}} makes a final trimming copy because the 
internal array is larger than the actual packet.

As a side effect, \{{encodeMemcache}} also mutates the message being encoded 
(\{{msg.key(...)}} / \{{msg.value(...)}} are overwritten with the serialized 
\{{byte[]}}).

h3. Change

* \{{encodeObj}} returns \{{T2<byte[], Integer>}} (encoded bytes + type flags) 
instead of writing into a caller-provided \{{ByteArrayOutputStream}}. For all 
fixed-width and \{{String}}/\{{byte[]}} payloads the encoded array is produced 
directly; for JDK-serialized objects \{{U.marshal(marsh, obj)}} is used instead 
of marshalling into a BAOS.
* \{{encodeMemcache}} computes the exact packet size up front and allocates 
\{{GridByteArrayList(HDR_LEN + flagsLen + keyLen + dataLen)}}. The list never 
grows, so \{{entireArray()}} returns its internal array without copying. The 
only remaining copy is the single append of key/value bytes into the packet 
buffer.
* The side-effecting mutation of \{{msg.key()}} / \{{msg.value()}} during 
encoding is removed.

Net effect for \{{String}}/\{{byte[]}} payloads: 4 full payload copies → 1.

Wire format is *unchanged* (byte-for-byte identical packets, verified in 
benchmark setup: the old and new encoders are replicated verbatim and their 
outputs compared with \{{Arrays.equals}} for every payload; the benchmark 
aborts on any mismatch).

h3. Benchmark

JMH 1.37, \{{Mode.AverageTime}}, 1 fork, 3×1s warmup, 5×1s measurement, 
\{{-prof gc}}; Apple Silicon, JDK 17 (Amazon Corretto 17.0.11); key = short 
\{{String}}, payloads: 64-byte \{{String}}, 1 KiB \{{String}}, 8 KiB 
\{{byte[]}}, \{{HashMap}} of 10 entries (JDK serialization).

||Payload||old, ns/op||new, ns/op||Time||old, B/op||new, B/op||Alloc||
|STR_64|50.6 ± 0.8|21.3 ± 1.9|−58%|872|240|−72%|
|STR_1K|208.9 ± 20.2|73.8 ± 9.4|−65%|6,632|2,160|−67%|
|BYTES_8K|1,596.0 ± 110.4|433.0 ± 9.1|−73%|41,432|8,352|−80%|
|OBJ_MAP|962.3 ± 138.8|1,094.3 ± 194.1|parity ^1^|5,528|4,184|−24%|

^1^ JDK-serialized payloads are dominated by serialization cost itself; the 
time difference is within the error bars, while per-op allocations still drop.

h3. Testing

{\{TcpRestParserSelfTest}}, \{{RestMemcacheProtocolSelfTest}}, 
\{{ClientMemcachedProtocolSelfTest}} — 38/38 pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to