Ali Alsuliman created ASTERIXDB-3172:
----------------------------------------

             Summary: Result tuples unnecessarily serialized twice
                 Key: ASTERIXDB-3172
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3172
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: RT - Runtime
    Affects Versions: 0.9.6
            Reporter: Ali Alsuliman
            Assignee: Ali Alsuliman
             Fix For: 0.9.9


The ResultWriterOperatorDescriptor is the operator that persists the query 
result to disk. Each partition persists its portion of the result by 
serializing the tuples (that are in ADM format) as JSON strings into a byte 
array. This byte array that represents the JSON is added into a frame that is 
used to write the accumulated tuples to the result file. If the byte array is 
added to the frame successfully, the byte array is reset and the next tuple is 
serialized into it. However, if the byte array couldn't be added to the frame 
because the frame is full, the frame is flushed to disk and is emptied but also 
the byte array is reset at the same time. This leads to having to re-serialized 
the tuple again into the byte array and adding it to the frame. This becomes 
expensive especially for large tuples.

The byte array should not be reset upon flushing the frame when the frame 
cannot hold it. Instead, the frame should be flushed without reseting the byte 
array. Then, adding the byte array to the frame should be attempted again. When 
adding the byte array is successful, it should be reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to