andygrove commented on code in PR #8126:
URL: https://github.com/apache/arrow-datafusion/pull/8126#discussion_r1390040161


##########
datafusion/physical-plan/src/filter.rs:
##########
@@ -194,11 +194,13 @@ impl ExecutionPlan for FilterExec {
     fn statistics(&self) -> Result<Statistics> {
         let predicate = self.predicate();
 
+        let input_stats = self.input.statistics()?;
         let schema = self.schema();
         if !check_support(predicate, &schema) {
-            return Ok(Statistics::new_unknown(&schema));
+            // assume worst case, that the filter is highly selective and
+            // returns all the rows from its input
+            return Ok(input_stats.clone().into_inexact());

Review Comment:
   The talk [Join Order Optimization with (almost) no 
Statistics](https://www.youtube.com/watch?v=aNRoR0Z3SzU) is focused on full 
join reordering rather than just choosing the build side of a join but talks 
about selectivity estimates and is very relevant to this discussion. They found 
that selectivity of 0.2 worked well with TPC-H.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to