aaraujo opened a new pull request, #17626:
URL: https://github.com/apache/datafusion/pull/17626
When aggregation operations produce unqualified column names in their output
schema, subsequent operations (like binary expressions) may still reference the
original qualified names. This fix adds fallback resolution that attempts to
match qualified column references to unqualified column names when the exact
qualified match is not found.
This resolves errors like:
'No field named table.column. Valid fields are column'
that occur in expressions like `avg(table.column) / 1024` where the
aggregation produces an unqualified 'value' field but the division still
references 'table.column'.
Includes test case demonstrating the issue and fix.
## Which issue does this PR close?
This PR addresses a schema resolution issue discovered during integration
testing. No existing issue was filed.
## Rationale for this change
Currently, when an aggregation function produces an unqualified output
schema (e.g., just "value" without a table qualifier), subsequent binary
operations that reference the original qualified column name fail with a schema
resolution error. This is a common pattern in SQL queries where aggregations
are combined with arithmetic operations.
For example:
```sql
SELECT avg(metrics.value) / 1024 FROM metrics
The aggregation produces an unqualified "value" field, but the division
operation still carries the qualified reference "metrics.value",
causing the query to fail.
What changes are included in this PR?
- Modified Expr::Column case in get_type() and nullable() methods in
datafusion/expr/src/expr_schema.rs to add fallback resolution
- When a qualified column reference is not found, attempts to resolve
using just the column name without the qualifier
- Added comprehensive test case test_qualified_column_after_aggregation
that demonstrates the issue and validates the fix
Are these changes tested?
Yes, includes a new test case that:
- Creates a schema simulating aggregation output (unqualified fields)
- Tests resolution of qualified column references against this schema
- Validates both direct column access and binary expressions
- Verifies both data type and nullability resolution
Are there any user-facing changes?
This is a bug fix that makes previously failing queries work correctly. No
breaking changes to existing functionality.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]