thinkharderdev commented on code in PR #3470:
URL: https://github.com/apache/arrow-datafusion/pull/3470#discussion_r970907156
##########
datafusion/core/src/physical_plan/file_format/row_filter.rs:
##########
@@ -202,7 +202,18 @@ impl<'a> ExprRewriter for FilterCandidateBuilder<'a> {
fn mutate(&mut self, expr: Expr) -> Result<Expr> {
if let Expr::Column(Column { name, .. }) = &expr {
if self.file_schema.field_with_name(name).is_err() {
- return Ok(Expr::Literal(ScalarValue::Null));
+ // the column expr must be in the table schema
+ return match self.table_schema.field_with_name(name) {
Review Comment:
In the case where the table has projected columns (like in
`parquet_multiple_partitions` where we project columns from the hive partition
directory structure) then the projected columns are not in the table schema.
If the predicate contains any such projected columns then we ignore it
anyway. So I think this change is fine but with the one adjustment that we
don't return an error. So maybe
```
if let Ok(field) self.table_schema.field_with_name(name) {
// return the null value corresponding to the data type
let null_value = ScalarValue::try_from(field.data_type())?;
return Ok(Expr::Literal(null_value));
}
```
In this case we just won't rewrite the expression but also
`FilterCandidateBuilder.projected_columns` will be `true` so we won't crate a
row filter predicate for the `Expr`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]