aaraujo commented on PR #17634:
URL: https://github.com/apache/datafusion/pull/17634#issuecomment-3316071773
@Jefffrey Absolutely! Here's a standalone test case that reproduces the
issue without external dependencies:
## Standalone Reproduction
```rust
// Save as: datafusion/core/examples/qualified_column_repro.rs
// Run with: cargo run --example qualified_column_repro
use datafusion::arrow::datatypes::{DataType, Field};
use datafusion::common::{Column, DFSchema, Result, TableReference};
use datafusion::logical_expr::{lit, BinaryExpr, Expr, ExprSchemable,
Operator};
#[tokio::main]
async fn main() -> Result<()> {
// Create a schema that represents the output of an aggregation
// Aggregations produce unqualified column names in their output schema
let post_agg_schema = DFSchema::from_unqualified_fields(
vec![Field::new("avg(metrics.value)", DataType::Float64,
true)].into(),
Default::default(),
)?;
println!("Post-aggregation schema has field: {:?}",
post_agg_schema.fields()[0].name());
// Create a qualified column reference (as the optimizer might produce)
let qualified_col = Expr::Column(Column::new(
Some(TableReference::bare("metrics")),
"avg(metrics.value)"
));
// Create a binary expression: metrics.avg(metrics.value) / 1024
let binary_expr = Expr::BinaryExpr(BinaryExpr::new(
Box::new(qualified_col.clone()),
Operator::Divide,
Box::new(lit(1024.0)),
));
println!("\nTrying to resolve qualified column:
metrics.avg(metrics.value)");
match qualified_col.get_type(&post_agg_schema) {
Ok(dtype) => println!("✓ SUCCESS: Resolved to type {:?}", dtype),
Err(e) => println!("✗ ERROR: {}", e),
}
println!("\nTrying to resolve binary expression:
metrics.avg(metrics.value) / 1024");
match binary_expr.get_type(&post_agg_schema) {
Ok(dtype) => println!("✓ SUCCESS: Resolved to type {:?}", dtype),
Err(e) => println!("✗ ERROR: {}", e),
}
Ok(())
}
```
Results
Without the fix:
Post-aggregation schema has field: "avg(metrics.value)"
Trying to resolve qualified column: metrics.avg(metrics.value)
✗ ERROR: Schema error: No field named metrics."avg(metrics.value)". Did you
mean 'avg(metrics.value)'?.
Trying to resolve binary expression: metrics.avg(metrics.value) / 1024
✗ ERROR: Schema error: No field named metrics."avg(metrics.value)". Did you
mean 'avg(metrics.value)'?.
With the fix:
Post-aggregation schema has field: "avg(metrics.value)"
Trying to resolve qualified column: metrics.avg(metrics.value)
✓ SUCCESS: Resolved to type Float64
Trying to resolve binary expression: metrics.avg(metrics.value) / 1024
✓ SUCCESS: Resolved to type Float64
The Issue
The problem occurs when:
1. An aggregation produces an unqualified output schema (e.g.,
avg(metrics.value) becomes just "avg(metrics.value)" without the table
qualifier)
2. Subsequent operations (like binary expressions) still reference the
qualified column name (metrics."avg(metrics.value)")
3. Schema resolution fails because the qualified name doesn't exist in the
post-aggregation schema
This pattern commonly occurs in query builders, ORMs, and SQL translation
layers that maintain qualified references throughout the query pipeline for
clarity and correctness.
The Fix
The fix adds a fallback mechanism in expr_schema.rs that:
- First attempts to resolve the column with its qualifier
- If that fails AND the column has a qualifier, tries resolving without the
qualifier
- Only returns an error if both attempts fail (preserving the original error
message)
This conservative approach maintains backward compatibility while enabling
legitimate query patterns that were previously failing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]