Blizzara commented on code in PR #14194:
URL: https://github.com/apache/datafusion/pull/14194#discussion_r1925612355


##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -559,12 +559,31 @@ pub fn from_table_scan(
     let table_schema = scan.source.schema().to_dfschema_ref()?;
     let base_schema = to_substrait_named_struct(&table_schema)?;
 
+    let best_effort_filter_option = if !scan.filters.is_empty() {
+        let table_schema_qualified = Arc::new(
+            DFSchema::try_from_qualified_schema(
+                scan.table_name.clone(),
+                &(scan.source.schema()),
+            )
+            .unwrap(),
+        );
+        let mut combined_expr = scan.filters[0].clone();
+        for i in 1..scan.filters.len() {
+            combined_expr = combined_expr.and(scan.filters[i].clone());
+        }
+        let best_effort_filter_expr =
+            producer.handle_expr(&combined_expr, &table_schema_qualified)?;
+        Some(Box::new(best_effort_filter_expr))
+    } else {
+        None
+    };
+
     Ok(Box::new(Rel {
         rel_type: Some(RelType::Read(Box::new(ReadRel {
             common: None,
             base_schema: Some(base_schema),
             filter: None,
-            best_effort_filter: None,
+            best_effort_filter: best_effort_filter_option,

Review Comment:
   Hm, from reading the Substrait plan it sounds like the "best effort" filter 
would be something that the read node _can_ drop rows based on but doesn't 
necessarily _have to_. So having "col1 < 5" as best-effort filter would say 
that any rows where that doesn't match can be dropped by the read, but it's 
okay if some of those pass. Then the read could do something like read parquet 
footer, if it sees "min of col1 = 6", it could skip that whole file, but if it 
sees "min of col1 = 2, max = 700", it could include the full file.
   
   As compared to the "filter", which presumably should not let any 
non-fulfilling rows pass.
   
   The docstring for `scan.filters` says "/// Optional expressions to be used 
as filters by the table provider", so not sure which one that falls under?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to