alamb opened a new issue #1694:
URL: https://github.com/apache/arrow-datafusion/issues/1694


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   In IOx each table is broken up logically into chunks (like row groups in 
parquet files) but the chunks might be missing some columns and each chunk has 
its own statistics
   
   When predicates are applied to scan / filter these chunks, they are 
potentially in terms of all columns of a table. If a chunk is missing a column 
(or we know from statistics that it is not null) expressions like `col IS NULL` 
and `col IS NOT NULL` can be replaced with `true` or `false` and predicates 
like `col > 5` can be replaced with `null > 5` in some cases
   
   Once this substitution is done, that may allow additional simplification of 
the predicate -- ideally all the way down to `true` or `false` 
   
   One particular type of this expression we will use in IOx is to map `null` 
to a `''` value like this:
   
   ```sql
   CASE 
     WHEN col is NULL THEN '' 
     ELSE col 
   END
   ```
   
   The same general pattern likely holds for ParquetExec now that 
@thinkharderdev  has added support to merge schemas for multiple files in 
https://github.com/apache/arrow-datafusion/pull/1622 once DataFusion is able to 
push predicates down into the parquet scans, simplifying the predicates as much 
as possible beforehand would be ideal. 
   
   The current API in 
https://github.com/apache/arrow-datafusion/blob/03075d5f4b3fdfd8f82144fcd409418832a4bf69/datafusion/src/optimizer/simplify_expressions.rs
 is 
   1. Private
   2. Requires `ExecutionProps` which is fairly entangled with the overall 
machinery of how plans are executed (and means we see issues like #1690 )
   
   **Describe the solution you'd like**
   I would like a DataFusion to have a public API for simplifying expressions. 
Proposed looks like
   
   ```rust
   pub trait ExprEvalContext {
   }
   
   struct Expr {
     fn simplify(self, &dyn ExprEvalContext) -> Self {
     }
   
   }
   ```
   
   I am thinking like `ExprEvalContext` as a trait so that it is clear what 
Expression Evaluation actually requires as well as allow Expr's to be 
simplified prior to execution or in the bowels of DataFusion's planer (and I 
will implement it for ExecutionProps). 
   
   **Describe alternatives you've considered**
   I am not fully sure about the API design -- I'll know more when I sketch one 
out
   
   **Additional context**
   https://github.com/apache/arrow-datafusion/issues/1693 
   https://github.com/influxdata/influxdb_iox/pull/3557


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to