Blizzara commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1925612355
########## datafusion/substrait/src/logical_plan/producer.rs: ########## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_ref()?; let base_schema = to_substrait_named_struct(&table_schema)?; + let best_effort_filter_option = if !scan.filters.is_empty() { + let table_schema_qualified = Arc::new( + DFSchema::try_from_qualified_schema( + scan.table_name.clone(), + &(scan.source.schema()), + ) + .unwrap(), + ); + let mut combined_expr = scan.filters[0].clone(); + for i in 1..scan.filters.len() { + combined_expr = combined_expr.and(scan.filters[i].clone()); + } + let best_effort_filter_expr = + producer.handle_expr(&combined_expr, &table_schema_qualified)?; + Some(Box::new(best_effort_filter_expr)) + } else { + None + }; + Ok(Box::new(Rel { rel_type: Some(RelType::Read(Box::new(ReadRel { common: None, base_schema: Some(base_schema), filter: None, - best_effort_filter: None, + best_effort_filter: best_effort_filter_option, Review Comment: Hm, from reading the Substrait plan it sounds like the "best effort" filter would be something that the read node _can_ drop rows based on but doesn't necessarily _have to_. So having "col1 < 5" as best-effort filter would say that any rows where that doesn't match can be dropped by the read, but it's okay if some of those pass. Then the read could do something like read parquet footer, if it sees "min of col1 = 6", it could skip that whole file, but if it sees "min of col1 = 2, max = 700", it could include the full file. As compared to the "filter", which presumably should not let any non-fulfilling rows pass. The docstring for `scan.filters` says "/// Optional expressions to be used as filters by the table provider", so not sure which one that falls under? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org