adriangb commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3841366173

   I'm +1 on disabling the hash join pushdown by default if it's slow.
   
   I think we can then split work up into 3 streams:
   - Rework the expressions themselves.
     - Pull statistics out of the hash / case expressions.
       - Global statistics guard: `col < 123 AND col > 5 AND CASE ... END`
       - Per-partition statistics guard: `CASE WHEN col < 23 AND COL > 15 AND 
hash(col) % partitions = 0 THEN lookup ....`
   - Optimize expression evaluation (make hash computation more efficient, 
optimize the CASE expression, etc.)
   - Optimize expression optimization (the work you've been doing on 
`PhysicalExprSimplifier`)
   - Disable the expressions dynamically.
   
   For that last point I think it should be pretty easy to code up a wrapper 
`PhysicalExpr`:
   
   ```rust
   struct Selectivity {
       rows_processed: usize,
       rows_matched: usize,
   }
   
   struct OptionalFilterPhysicalExpr {
      target_selectivity: f64,
      inner: RwLock<Option<Arc<dyn PhysicalExpr>>>,
      selectivity: RwLock<Selectivity>,
   }
   
   impl PhysicalExpr for OptionalFilterPhysicalExpr {
      fn evaluate(&self, batch: RecordBatch) -> Result<RecordBatch> {
         // if inner is None return all true
         // otherwise evaluate with inner and update selectivity
        // if selectivity is < target after evaluating X rows make inner None
        // (which also drops the reference so that any producers of dynamic 
filters can stop updating them if they want)
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to