[
https://issues.apache.org/jira/browse/ASTERIXDB-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701435#comment-17701435
]
ASF subversion and git services commented on ASTERIXDB-3134:
------------------------------------------------------------
Commit bc24e1419074bea041159ea1865c4d1c629a5697 in asterixdb's branch
refs/heads/master from Wail Alkowaileet
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=bc24e14190 ]
[ASTERIXDB-3141][ASTERIXDB-3134] Allow querying columnar datasets
- user model changes: yes
- storage format changes: no
- interface changes: yes
Details:
This patch adds the ability to query columnar datasets.
Also, it teaches the compiler to read only the requested
columns. This patch also includes the ability to filter
mega-leaf nodes given a query predicate.
Interface changes:
- IMetadataProvider#getScannerRuntime()
* To allow projections for both data records and meta records
- IProjectionInfo
* Renamed to IProjectionFiltrationInfo
* Added getFilterExpression() for columnar filters
User model changes:
- After this change you can create columnar datasets
Example:
CREATE DATASET ExperDataset(ExperType)
PRIMARY KEY uid AUTOGENERATED
WITH {
"dataset-format":{"format":"column"}
};
- Added compiler property:
* compiler.column.filter
to enable/disable the usage of columnar filter
- Added storage properties:
* storage.column.max.tuple.count
An integer to tell the maximum number of
tuples to store per mega leaf node
* storage.column.free.space.tolerance
the percentage of tolerable empty space to
minimize column splitting
Change-Id: Ie9188bbd8463db22bf10c6871046c680528d5640
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17430
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Wail Alkowaileet <[email protected]>
Reviewed-by: Murtadha Hubail <[email protected]>
> Enable columnar filters
> -----------------------
>
> Key: ASTERIXDB-3134
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3134
> Project: Apache AsterixDB
> Issue Type: Improvement
> Components: COMP - Compiler, STO - Storage
> Affects Versions: 0.9.9
> Reporter: Wail Y. Alkowaileet
> Assignee: Wail Y. Alkowaileet
> Priority: Major
> Fix For: 0.9.9
>
>
> One of the features of the new columnar format is that it keeps the min/max
> values for each column per mega leaf node (a multi-pages leaf node, which
> stores 15K tuples by default). For queries with predicates, the filters could
> be utilized to skip reading the columns of the tuples in a mega leaf node
> that do not satisfy those predicates.
> For example, in the following query:
> {code:java}SELECT name, age, salary
> FROM Employee
> WHERE age BETWEEN 20 AND 30
> AND salary > 100000{code}
> the columns name, age, and salary will be read if only the mega leaf node
> contains employees whose ages are between 20 and 30, and whose salary is
> greater than 100K
--
This message was sent by Atlassian Jira
(v8.20.10#820010)