alamb opened a new issue, #8689:
URL: https://github.com/apache/arrow-datafusion/issues/8689

   ### Is your feature request related to a problem or challenge?
   
   DataFusion tries to avoid doing work when at all possible to improve query 
performance
   
   Part of this work is to determine when filters can never be true and avoid 
doing work
   
   For example
   
   ```sql
   DataFusion CLI v34.0.0
   ❯ create table t(x int) as values (1), (2), (3);
   0 rows in set. Query took 0.003 seconds.
   ```
   
   When DataFusion sees a filter that can't be true it skips even scanning the 
data
   ```
   ❯ explain select x from t where false;
   +---------------+---------------+
   | plan_type     | plan          |
   +---------------+---------------+
   | logical_plan  | EmptyRelation |
   | physical_plan | EmptyExec     |
   |               |               |
   +---------------+---------------+
   2 rows in set. Query took 0.001 seconds.
   ```
   
   However, it currently does not skip scanning if the filter is NULL (which 
also can't be true). Note the `MemoryExec` is still present:
   
   ```
   ❯ explain select x from t where null::bool;
   +---------------+---------------------------------------------------+
   | plan_type     | plan                                              |
   +---------------+---------------------------------------------------+
   | logical_plan  | Filter: Boolean(NULL)                             |
   |               |   TableScan: t projection=[x]                     |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192       |
   |               |   FilterExec: NULL                                |
   |               |     MemoryExec: partitions=1, partition_sizes=[1] |
   |               |                                                   |
   +---------------+---------------------------------------------------+
   ```
   
   
   ### Describe the solution you'd like
   
   I would like to avoid scanning when the filter evaluates to NULL in addition 
to `false` (the second example above should not have a `MemoryExec` in it)
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   I think this is a good first issue that should be relatively simple to 
implement and would be a good introduction to DataFusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to