abey79 opened a new issue, #15922: URL: https://github.com/apache/datafusion/issues/15922
### Describe the bug Given a table in a `SessionContext` and the `RecordBatch` that backs it (e.g. through `ctx.register_batch()`), I want to refer to the table's columns using the field names found in the `RecordBatch`'s schema. In some cases, this fails with `col(col_name)`. I must instead use `col(format!("\"{col_name}\""))`, which is hard to discover and likely to be missed even when one is aware of the issue. This is compounded by the fact that specific column names will trigger the failure, like "A", but not "Column A". (I'm now assuming that the space in the latter triggers some auto-escaping mechanism.) ### To Reproduce ```rust use arrow::array::{Int64Array, RecordBatch}; use arrow::datatypes::{DataType, Field, Schema}; use datafusion::common::DataFusionError; use datafusion::logical_expr::col; use datafusion::prelude::SessionContext; use std::sync::Arc; async fn test_single_column(col_name: &str) -> Result<(), DataFusionError> { // create a simple batch let column = Int64Array::from(vec![1, 2, 3]); let schema = Schema::new(vec![Field::new(col_name, DataType::Int64, false)]); println!("Column name: {col_name}"); println!("Initial arrow schema name: {}", schema.fields()[0].name()); let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(column)]) .expect("could not create record batch"); // create a DataFusion context let ctx = SessionContext::new(); ctx.register_batch("test", batch)?; println!( "Session context schema name: {}", ctx.table("test").await?.schema().fields()[0].name() ); let result = ctx .table("test") .await? .select(vec![col(col_name)])? // use this instead to avoid the issue //.select(vec![col(format!("\"{col_name}\""))])? .collect() .await? .into_iter() .last() .ok_or(DataFusionError::External("no batch returned".into()))?; println!( "Result batch schema name: {}", result.schema().fields()[0].name() ); Ok(()) } #[tokio::main] async fn main() -> anyhow::Result<()> { let names = &["A", "a", "Column A"]; for name in names { if let Err(e) = test_single_column(name).await { eprintln!("Error processing column name '{}': {}", name, e); } println!("--------------------------------"); } Ok(()) } ``` ### Result: ``` Column name: A Initial arrow schema name: A Session context schema name: A Error processing column name 'A': Schema error: No field named a. Valid fields are test."A". -------------------------------- Column name: a Initial arrow schema name: a Session context schema name: a Result batch schema name: a -------------------------------- Column name: Column A Initial arrow schema name: Column A Session context schema name: Column A Result batch schema name: Column A ``` Noteworthy: - The error message is particularly confusing, since I _did_ use `"A"` - The (seemingly) inconsistent behaviour between "A" and "Column A" (with the latter actually working). ### Expected behavior All three test cases pass ### Additional context In this test case, like in the actual codebase I'm working on, I am not making use of any SQL. This makes name casing issue particularly unexpected. Probably related: - https://github.com/apache/datafusion/issues/14832 - https://github.com/apache/datafusion/issues/14373 - https://github.com/apache/datafusion/issues/13649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org