[
https://issues.apache.org/jira/browse/ASTERIXDB-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233628#comment-16233628
]
ASF subversion and git services commented on ASTERIXDB-2133:
------------------------------------------------------------
Commit c04046c11be062c4320ba7e0bbd5258ffb600ad3 in asterixdb's branch
refs/heads/master from [~luochen01]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=c04046c ]
[ASTERIXDB-2133] Fix unncessary binary search in GroupFrameAccessor
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
- GroupFrameAccessor holds a list of frames from a run during the merge
step of merge sort. However, everytime we access a tuple, it performs
binary search to get the physical tuple index. This patch fixes this
by remembering the last accessed frame. It is expected that tuples
are accessed sequentially (since it's the merge step), which greatly
reduces binary searches
Change-Id: I4a1b19ad47f6b1dda4bd5c417932e4c9ba36a714
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2079
Sonar-Qube: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Contrib: Jenkins <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Ian Maxon <[email protected]>
> Unnecessary BinarySearch in GroupFrameAccessor
> ----------------------------------------------
>
> Key: ASTERIXDB-2133
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2133
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: HYR - Hyracks
> Reporter: Chen Luo
> Assignee: Chen Luo
> Priority: Major
>
> During the merge step of merge sort, if there is enough memory but only a few
> of runs to be merged, we would load multiple frames per run into the
> GroupFrameAccessor. Every time when we access a tuple, GroupFrameAccessor
> performs binary search over the inner frames to translate logical tuple index
> into the physical one (inner frame Id + index).
> However, this is highly inefficient, and partially results in the fact that
> more memory budget of the sort operation would result in slower performance.
> Since GroupFrameAccessor is only used by merge sort, it is expected that
> tuples are accessed sequentially, instead of randomly. Specially
> optimizations can be adopted based on this sequentially access pattern.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)