[ 
https://issues.apache.org/jira/browse/ARROW-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245907#comment-17245907
 ] 

Remi Dettai edited comment on ARROW-9770 at 12/8/20, 2:11 PM:
--------------------------------------------------------------

This could also be used to apply filter pushdown into the catalog, if the 
combination (and) of:
 - the file/partition statistics ({{col( x ) > min and col( x ) < max}})
 - the filter expression

can be simplified to {{false}}, their is no need to read that file/partition. 
Isn't that wonderful ? :)

Note: there is an implementation of this folding for the C++ dataset API: 
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/filter.h]


was (Author: rdettai):
This could also be used to apply filter pushdown into the catalog, if the 
combination (and) of:
 - the file/partition statistics ({{col( x ) > min and col( x ) < max}})
 - the filter expression

can be simplified to {{false}}, their is no need to read that file/partition. 
Isn't that wonderful ? :)

Note: there is an implementation of this folding in C++ dataset: 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/filter.h

> [Rust] [DataFusion] Add constant folding to expressions during logically 
> planning
> ---------------------------------------------------------------------------------
>
>                 Key: ARROW-9770
>                 URL: https://issues.apache.org/jira/browse/ARROW-9770
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust, Rust - DataFusion
>            Reporter: Andrew Lamb
>            Priority: Minor
>
> The high level idea is that if an expression can be partially evaluated 
> during planning time then
> # The execution time will be increased
> # There may be additional optimizations possible (like removing entire 
> LogicalPlan nodes, for example)
> I recently saw the following selection expression created (by the [predicate 
> push down|https://github.com/apache/arrow/pull/7880])
> {code}
> Selection: #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1) And #a Eq 
> Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1)
>               TableScan: test projection=None
> {code}
> This could be simplified significantly:
> 1. Duplicate clauses could be removed (e.g. `#a Eq Int64(1) And #a Eq 
> Int64(1)` --> `#a Eq Int64(1)`)
> 2. Algebraic simplification (e.g. if `A<=B and A=5`, is the same as `A=5`)
> Inspiration can be taken from the postgres code that evaluates constant 
> expressions 
> https://doxygen.postgresql.org/clauses_8c.html#ac91c4055a7eb3aa6f1bc104479464b28
> (in this case, for example if you have a predicate A=5 then you can basically 
> substitute in A=5 for any expression higher up in the the plan).
> Other classic optimizations include things such as `A OR TRUE` --> `A`, `A 
> AND TRUE` --> TRUE,  etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to