abey79 opened a new issue, #15922:
URL: https://github.com/apache/datafusion/issues/15922

   ### Describe the bug
   
   Given a table in a `SessionContext` and the `RecordBatch` that backs it 
(e.g. through `ctx.register_batch()`), I want to refer to the table's columns 
using the field names found in the `RecordBatch`'s schema.
   
   In some cases, this fails with `col(col_name)`. I must instead use 
`col(format!("\"{col_name}\""))`, which is hard to discover and likely to be 
missed even when one is aware of the issue. This is compounded by the fact that 
specific column names will trigger the failure, like "A", but not "Column A". 
(I'm now assuming that the space in the latter triggers some auto-escaping 
mechanism.)
   
   ### To Reproduce
   
   ```rust
   use arrow::array::{Int64Array, RecordBatch};
   use arrow::datatypes::{DataType, Field, Schema};
   use datafusion::common::DataFusionError;
   use datafusion::logical_expr::col;
   use datafusion::prelude::SessionContext;
   use std::sync::Arc;
   
   async fn test_single_column(col_name: &str) -> Result<(), DataFusionError> {
       // create a simple batch
       let column = Int64Array::from(vec![1, 2, 3]);
       let schema = Schema::new(vec![Field::new(col_name, DataType::Int64, 
false)]);
   
       println!("Column name: {col_name}");
       println!("Initial arrow schema name: {}", schema.fields()[0].name());
   
       let batch = RecordBatch::try_new(Arc::new(schema), 
vec![Arc::new(column)])
           .expect("could not create record batch");
   
       // create a DataFusion context
       let ctx = SessionContext::new();
       ctx.register_batch("test", batch)?;
   
       println!(
           "Session context schema name: {}",
           ctx.table("test").await?.schema().fields()[0].name()
       );
   
       let result = ctx
           .table("test")
           .await?
           .select(vec![col(col_name)])?
           // use this instead to avoid the issue
           //.select(vec![col(format!("\"{col_name}\""))])?
           .collect()
           .await?
           .into_iter()
           .last()
           .ok_or(DataFusionError::External("no batch returned".into()))?;
   
       println!(
           "Result batch schema name: {}",
           result.schema().fields()[0].name()
       );
   
       Ok(())
   }
   
   #[tokio::main]
   async fn main() -> anyhow::Result<()> {
       let names = &["A", "a", "Column A"];
   
       for name in names {
           if let Err(e) = test_single_column(name).await {
               eprintln!("Error processing column name '{}': {}", name, e);
           }
   
           println!("--------------------------------");
       }
   
       Ok(())
   }
   ```
   
   ### Result:
   
   ```
   Column name: A
   Initial arrow schema name: A
   Session context schema name: A
   Error processing column name 'A': Schema error: No field named a. Valid 
fields are test."A".
   --------------------------------
   Column name: a
   Initial arrow schema name: a
   Session context schema name: a
   Result batch schema name: a
   --------------------------------
   Column name: Column A
   Initial arrow schema name: Column A
   Session context schema name: Column A
   Result batch schema name: Column A
   ```
   
   Noteworthy:
   - The error message is particularly confusing, since I _did_ use `"A"`
   - The (seemingly) inconsistent behaviour between "A" and "Column A" (with 
the latter actually working).
   
   
   ### Expected behavior
   
   All three test cases pass
   
   ### Additional context
   
   In this test case, like in the actual codebase I'm working on, I am not 
making use of any SQL. This makes name casing issue particularly unexpected.
   
   Probably related:
   - https://github.com/apache/datafusion/issues/14832
   - https://github.com/apache/datafusion/issues/14373
   - https://github.com/apache/datafusion/issues/13649


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to