[
https://issues.apache.org/jira/browse/ASTERIXDB-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420073#comment-16420073
]
ASF subversion and git services commented on ASTERIXDB-2339:
------------------------------------------------------------
Commit 3036c98099f6c912af29bced9b2bf71bdfa9026e in asterixdb's branch
refs/heads/master from [~luochen01]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=3036c98 ]
[ASTERIXDB-2339] Add a new inverted index merge cursor
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
- Implement a new inverted index merge cursor which uses two priority queues,
one for tokens and one for keys. For each token, we merge their inverted
lists using the key queue. After that, we fetch the next token and merge
their lists again. This reduces unnecessary token comparision a lot.
- Along this change, created a fast path for inverted index bulkloader.
Based on how the token+key pair is created, there is no need to copy
bulkloaded tuple and check whether it's a new token during merge.
Change-Id: I57d039cd7e08033884529a204bff9acffd96d9bb
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2519
Tested-by: Jenkins <[email protected]>
Contrib: Jenkins <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Ian Maxon <[email protected]>
> Improve Inverted Index Merge Performance
> ----------------------------------------
>
> Key: ASTERIXDB-2339
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2339
> Project: Apache AsterixDB
> Issue Type: Improvement
> Components: STO - Storage
> Reporter: Chen Luo
> Assignee: Chen Luo
> Priority: Major
>
> Currently, the merge of inverted index is implemented by a full range scan,
> i.e., token+key pairs are generated and fed into a priority queue to obtain a
> global ordering. However, it is typical that a token can correspond to tens
> or hundreds (or even much more) keys. As a result, comparisons of tokens are
> wasted because for many times tokens would be the same. To improve this, we
> can have two priority queues, one for tokens and one for keys. For each
> token, we merge their inverted lists using the key priority queue. After
> that, we fetch the next token from the token queue, and merge their inverted
> lists again.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)