[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701435#comment-17701435
 ] 

ASF subversion and git services commented on ASTERIXDB-3134:
------------------------------------------------------------

Commit bc24e1419074bea041159ea1865c4d1c629a5697 in asterixdb's branch 
refs/heads/master from Wail Alkowaileet
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=bc24e14190 ]

[ASTERIXDB-3141][ASTERIXDB-3134] Allow querying columnar datasets

- user model changes: yes
- storage format changes: no
- interface changes: yes

Details:
This patch adds the ability to query columnar datasets.
Also, it teaches the compiler to read only the requested
columns. This patch also includes the ability to filter
mega-leaf nodes given a query predicate.

Interface changes:
- IMetadataProvider#getScannerRuntime()
  * To allow projections for both data records and meta records
- IProjectionInfo
  * Renamed to IProjectionFiltrationInfo
  * Added getFilterExpression() for columnar filters

User model changes:
- After this change you can create columnar datasets
Example:
  CREATE DATASET ExperDataset(ExperType)
  PRIMARY KEY uid AUTOGENERATED
  WITH {
    "dataset-format":{"format":"column"}
  };

- Added compiler property:
  * compiler.column.filter
  to enable/disable the usage of columnar filter

- Added storage properties:
  * storage.column.max.tuple.count
  An integer to tell the maximum number of
  tuples to store per mega leaf node
  * storage.column.free.space.tolerance
  the percentage of tolerable empty space to
  minimize column splitting

Change-Id: Ie9188bbd8463db22bf10c6871046c680528d5640
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17430
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Wail Alkowaileet <[email protected]>
Reviewed-by: Murtadha Hubail <[email protected]>


> Enable columnar filters
> -----------------------
>
>                 Key: ASTERIXDB-3134
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3134
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: COMP - Compiler, STO - Storage
>    Affects Versions: 0.9.9
>            Reporter: Wail Y. Alkowaileet
>            Assignee: Wail Y. Alkowaileet
>            Priority: Major
>             Fix For: 0.9.9
>
>
> One of the features of the new columnar format is that it keeps the min/max 
> values for each column per mega leaf node (a multi-pages leaf node, which 
> stores 15K tuples by default). For queries with predicates, the filters could 
> be utilized to skip reading the columns of the tuples in a mega leaf node 
> that do not satisfy those predicates.
> For example, in the following query:
> {code:java}SELECT name, age, salary
> FROM Employee
> WHERE age BETWEEN 20 AND 30
> AND salary > 100000{code}
> the columns name, age, and salary will be read if only the mega leaf node 
> contains employees whose ages are between 20 and 30, and whose salary is 
> greater than 100K



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to