Ali Alsuliman created ASTERIXDB-3172:
----------------------------------------
Summary: Result tuples unnecessarily serialized twice
Key: ASTERIXDB-3172
URL: https://issues.apache.org/jira/browse/ASTERIXDB-3172
Project: Apache AsterixDB
Issue Type: Bug
Components: RT - Runtime
Affects Versions: 0.9.6
Reporter: Ali Alsuliman
Assignee: Ali Alsuliman
Fix For: 0.9.9
The ResultWriterOperatorDescriptor is the operator that persists the query
result to disk. Each partition persists its portion of the result by
serializing the tuples (that are in ADM format) as JSON strings into a byte
array. This byte array that represents the JSON is added into a frame that is
used to write the accumulated tuples to the result file. If the byte array is
added to the frame successfully, the byte array is reset and the next tuple is
serialized into it. However, if the byte array couldn't be added to the frame
because the frame is full, the frame is flushed to disk and is emptied but also
the byte array is reset at the same time. This leads to having to re-serialized
the tuple again into the byte array and adding it to the frame. This becomes
expensive especially for large tuples.
The byte array should not be reset upon flushing the frame when the frame
cannot hold it. Instead, the frame should be flushed without reseting the byte
array. Then, adding the byte array to the frame should be attempted again. When
adding the byte array is successful, it should be reset.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)