[
https://issues.apache.org/jira/browse/ASTERIXDB-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721823#comment-17721823
]
ASF subversion and git services commented on ASTERIXDB-3180:
------------------------------------------------------------
Commit fa8a284f41ffdcdd8f2a6576f6333efc51fcbd4e in asterixdb's branch
refs/heads/master from Wail Alkowaileet
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=fa8a284f41 ]
[ASTERIXDB-3180][COMP][RT] Apply filter before assembling columnar datasets
- user model changes: no
- storage format changes: no
- interface changes: yes
Details:
This patch implements an idea by Mike Carey, which says
let's use the columns as a "poorman" index. The condition
expression of SELECT is pushed down to data-scan and
the following is performed for each mega-leaf node:
1- Read all the columns involved in the SELECT condition expression.
2- Look for a tuple that satisfies the condition
- If none exists, skip reading the rest of the columns
- If at least one exists, read the rest of the columns
3- For each subsequent call to next() in the LSM cursor,
check whether the returned tuple satisfies the condition
- If yes, assemble and return the tuple
- If no, skip and go to the next tuple and repeat
Change-Id: Ia83b839633d83ac6e3ffb4340a1d144daa0b299d
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17510
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Wail Alkowaileet <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
> Apply filter before assembling columnar datasets
> ------------------------------------------------
>
> Key: ASTERIXDB-3180
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3180
> Project: Apache AsterixDB
> Issue Type: Improvement
> Components: COMP - Compiler, RT - Runtime
> Affects Versions: 0.9.9
> Reporter: Wail Y. Alkowaileet
> Assignee: Wail Y. Alkowaileet
> Priority: Major
> Fix For: 0.9.9
>
>
> The idea here is to examine column(s) in the WHERE clause before record
> assembly (Mike Carey refers to this approach as "poor man's index"). The
> sequence could be summarized as follows:
> * We first read the filtering columns (i.e., columns in the WHERE clause)
> * If the column(s)
> ** satisfy the query predicate, we read the rest of the requested columns
> and we assemble the record
> ** If not, we simply fetch the next tuple
> This approach can improve the I/O (by skip reading columns if possible) and
> also avoid assembling records that will be filtered anyway – a wasted CPU
> expense
--
This message was sent by Atlassian Jira
(v8.20.10#820010)