nseekhao commented on code in PR #14194:
URL: https://github.com/apache/datafusion/pull/14194#discussion_r1975616391
##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -559,12 +559,31 @@ pub fn from_table_scan(
let table_schema = scan.source.schema().to_dfschema_ref()?;
let base_schema = to_substrait_named_struct(&table_schema)?;
+ let best_effort_filter_option = if !scan.filters.is_empty() {
+ let table_schema_qualified = Arc::new(
+ DFSchema::try_from_qualified_schema(
+ scan.table_name.clone(),
+ &(scan.source.schema()),
+ )
+ .unwrap(),
+ );
+ let mut combined_expr = scan.filters[0].clone();
+ for i in 1..scan.filters.len() {
+ combined_expr = combined_expr.and(scan.filters[i].clone());
+ }
+ let best_effort_filter_expr =
+ producer.handle_expr(&combined_expr, &table_schema_qualified)?;
+ Some(Box::new(best_effort_filter_expr))
+ } else {
+ None
+ };
+
Ok(Box::new(Rel {
rel_type: Some(RelType::Read(Box::new(ReadRel {
common: None,
base_schema: Some(base_schema),
filter: None,
- best_effort_filter: None,
+ best_effort_filter: best_effort_filter_option,
Review Comment:
Thanks @westonpace ! Makes sense, but let me make sure my understanding is
correct.
So let's say we have
```sql
select c2 from t1 where c1 > 0;
```
The logical plan will look something like:
```
Projection:
expr: c1
Filter:
predicate: (c1 > 0)
TableScan:
table_name: t1
filters: None
...
```
During logical plan optimization, the predicate from `Filter` will get
pushed down to `TableScan`. Let's say data is partitioned and metadata contains
ranges of `c1`. Since we could have partitions that contain both positive and
negative `c1` values, my understanding is that the filter will be `Inexact` to
make sure the physical plan knows to keep the original `FilterExec`. So the
logical plan will look something like:
```
Projection:
expr: c1
Filter:
predicate: (c1 > 0)
TableScan:
table_name: t1
filters: [(c1>0, Inexact)]
...
```
So the logical `TableScan` is saying:
* "The filter `c1 > 0` has to be applied" -- communicated via
`TableScan#filters` containing `c1 > 0` expression
* "But I cannot be sure that I won't get in extra rows due to how the data
is partitioned" -- communicated via `Inexact`
* "So please make sure the filter relation that asked for this condition
stays when we translate this logical plan to physical plan. To ensure the
correct results" -- communicated via `Inexact`
Is this a correct understanding?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]