[
https://issues.apache.org/jira/browse/ASTERIXDB-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737214#comment-17737214
]
ASF subversion and git services commented on ASTERIXDB-3172:
------------------------------------------------------------
Commit 06e4a33215c7df3879ff74a6469b75e946aabfcb in asterixdb's branch
refs/heads/master from Ali Alsuliman
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=06e4a33215 ]
[ASTERIXDB-3172][RT] Do not reset byte array holding serialized tuple
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
In the ResultWriterOperatorDescriptor, the frameOutputStream
should not reset the byte array that holds the serialized
tuple when adding the tuple to the frame (appendTuple()).
This leads to having to re-serialized the tuple again into
the byte array when the frame couldn't appendTuple() due to
being full of tuples.
Change-Id: Ibaaac339065a30f58e2bc7f39800a506f959549d
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17501
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Murtadha Hubail <[email protected]>
(cherry picked from commit 4f18020796c78bb4455bd7bec2946f83650da427)
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17567
Reviewed-by: Murtadha Al Hubail <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
> Result tuples unnecessarily serialized twice
> --------------------------------------------
>
> Key: ASTERIXDB-3172
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3172
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: RT - Runtime
> Affects Versions: 0.9.6
> Reporter: Ali Alsuliman
> Assignee: Ali Alsuliman
> Priority: Major
> Labels: triaged
> Fix For: 0.9.9
>
>
> The ResultWriterOperatorDescriptor is the operator that persists the query
> result to disk. Each partition persists its portion of the result by
> serializing the tuples (that are in ADM format) as JSON strings into a byte
> array. This byte array that represents the JSON is added into a frame that is
> used to write the accumulated tuples to the result file. If the byte array is
> added to the frame successfully, the byte array is reset and the next tuple
> is serialized into it. However, if the byte array couldn't be added to the
> frame because the frame is full, the frame is flushed to disk and is emptied
> but also the byte array is reset at the same time. This leads to having to
> re-serialized the tuple again into the byte array and adding it to the frame.
> This becomes expensive especially for large tuples.
> The byte array should not be reset upon flushing the frame when the frame
> cannot hold it. Instead, the frame should be flushed without reseting the
> byte array. Then, adding the byte array to the frame should be attempted
> again. When adding the byte array is successful, it should be reset.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)