[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944611#comment-17944611
 ] 

ASF subversion and git services commented on ASTERIXDB-3576:
------------------------------------------------------------

Commit c42cd2602c8dbb3090b5056db8db29f00e848cfd in asterixdb's branch 
refs/heads/master from Peeyush Gupta
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=c42cd2602c ]

[ASTERIXDB-3576][EXT] push predicates down to delta tables to filter row groups

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
Delta table's data files are essentially Parquet files. Parquet allows
applying a predicate while reading data files to skip row groups.
With this patch we pushdown filters to individual parquet files of the
Delta table to filter row groups. The Predicate class of the Delta Kernel API
is not serializable, so we have added a custom serialization/de-serialization
of Delta kernel APIs Predicates.

Ext-ref: MB-65315
Change-Id: I9fa1a84d7be63ada7b9768a81984b2172e7401b3
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19527
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Peeyush Gupta <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
Tested-by: Jenkins <[email protected]>


> Pushdown predicates for Delta tables to filter row groups
> ---------------------------------------------------------
>
>                 Key: ASTERIXDB-3576
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3576
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: EXT - External data
>            Reporter: Peeyush Gupta
>            Assignee: Peeyush Gupta
>            Priority: Major
>              Labels: triaged
>
> Delta tables stores information about min/max values for each column that can 
> be later used to filter out rows while reading the parquet files. This avoids 
> some time spent assembling rows. We should implement this improve query 
> performance of delta tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to