vbarua commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1926006701
########## datafusion/substrait/src/logical_plan/producer.rs: ########## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_ref()?; let base_schema = to_substrait_named_struct(&table_schema)?; + let best_effort_filter_option = if !scan.filters.is_empty() { + let table_schema_qualified = Arc::new( + DFSchema::try_from_qualified_schema( + scan.table_name.clone(), + &(scan.source.schema()), + ) + .unwrap(), + ); + let mut combined_expr = scan.filters[0].clone(); + for i in 1..scan.filters.len() { + combined_expr = combined_expr.and(scan.filters[i].clone()); + } + let best_effort_filter_expr = + producer.handle_expr(&combined_expr, &table_schema_qualified)?; + Some(Box::new(best_effort_filter_expr)) + } else { + None + }; + Ok(Box::new(Rel { rel_type: Some(RelType::Read(Box::new(ReadRel { common: None, base_schema: Some(base_schema), filter: None, - best_effort_filter: None, + best_effort_filter: best_effort_filter_option, Review Comment: We should be setting the `filter` field for this, and not `best_effort_filter`. The `filter` on the DataFusion TableScan _must_ be applied, so it should not be treated as a best-effort filter here which is potentially ignorable. My understanding of the best-effort filter is that is an optimization that can be used to reduce the number of rows read from a datasource. IMO, the docs on the best-effort filter could use some clarifications. I've filed a [ticket](https://github.com/substrait-io/substrait/issues/778) for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org