[jira] [Commented] (ASTERIXDB-3172) Result tuples unnecessarily serialized twice

ASF subversion and git services (Jira) Mon, 26 Jun 2023 08:57:06 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737214#comment-17737214
 ]


ASF subversion and git services commented on ASTERIXDB-3172:
------------------------------------------------------------

Commit 06e4a33215c7df3879ff74a6469b75e946aabfcb in asterixdb's branch 
refs/heads/master from Ali Alsuliman
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=06e4a33215 ]

[ASTERIXDB-3172][RT] Do not reset byte array holding serialized tuple

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
In the ResultWriterOperatorDescriptor, the frameOutputStream
should not reset the byte array that holds the serialized
tuple when adding the tuple to the frame (appendTuple()).
This leads to having to re-serialized the tuple again into
the byte array when the frame couldn't appendTuple() due to
being full of tuples.

Change-Id: Ibaaac339065a30f58e2bc7f39800a506f959549d
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17501
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Murtadha Hubail <[email protected]>
(cherry picked from commit 4f18020796c78bb4455bd7bec2946f83650da427)
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17567
Reviewed-by: Murtadha Al Hubail <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>


> Result tuples unnecessarily serialized twice
> --------------------------------------------
>
>                 Key: ASTERIXDB-3172
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3172
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: RT - Runtime
>    Affects Versions: 0.9.6
>            Reporter: Ali Alsuliman
>            Assignee: Ali Alsuliman
>            Priority: Major
>              Labels: triaged
>             Fix For: 0.9.9
>
>
> The ResultWriterOperatorDescriptor is the operator that persists the query 
> result to disk. Each partition persists its portion of the result by 
> serializing the tuples (that are in ADM format) as JSON strings into a byte 
> array. This byte array that represents the JSON is added into a frame that is 
> used to write the accumulated tuples to the result file. If the byte array is 
> added to the frame successfully, the byte array is reset and the next tuple 
> is serialized into it. However, if the byte array couldn't be added to the 
> frame because the frame is full, the frame is flushed to disk and is emptied 
> but also the byte array is reset at the same time. This leads to having to 
> re-serialized the tuple again into the byte array and adding it to the frame. 
> This becomes expensive especially for large tuples.
> The byte array should not be reset upon flushing the frame when the frame 
> cannot hold it. Instead, the frame should be flushed without reseting the 
> byte array. Then, adding the byte array to the frame should be attempted 
> again. When adding the byte array is successful, it should be reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ASTERIXDB-3172) Result tuples unnecessarily serialized twice

Reply via email to