johnkerl opened a new issue, #16017:
URL: https://github.com/apache/datafusion/issues/16017

   ### Describe the bug
   
   Doing `describe` with any upper-case or dot characters in column names 
results in
   ```
   Error: Execution("Schema error: No field named ...")
   ```
   
   ### To Reproduce
   
   `./Cargo.toml`
   ```
   [package]
   name = "describe-with-case"
   version = "0.1.0"
   edition = "2024"
   
   [dependencies]
   "datafusion" = "47"
   tokio = { version = "1.45", features = ["rt", "rt-multi-thread", "macros"] }
   ```
   
   `src/main.rs`
   ```
   use std::env;
   use datafusion::error::Result;
   use datafusion::execution::context::SessionContext;
   use datafusion::prelude::CsvReadOptions;
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let args: Vec<String> = env::args().collect();
       let ctx = SessionContext::new();
       for arg in args.iter().skip(1) {
           println!("");
           println!("Filename: {arg}");
   
           let df = ctx.read_csv(arg, CsvReadOptions::new()).await?;
           let stat = df.describe().await?.collect().await?;
           println!("{stat:?}");
       }
   
       Ok(())
   }
   ```
   
   `./desc-good.csv`
   ```
   abc,def,ghi
   1,2,3
   4,5,6
   7,8,9
   ```
   
   `./desc-bad.csv`
   ```
   abc,Def,gh.i
   1,2,3
   4,5,6
   7,8,9
   ```
   
   ### Expected behavior
   
   With column names `abc,def,ghi` we see
   
   `cargo run ./desc-good.csv`
   ```
   Argument ./desc-good.csv
   [RecordBatch { schema: Schema { fields: [Field { name: "describe", 
data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: 
{} }, Field { name: "abc", data_type: Float64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: "def", data_type: 
Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 
Field { name: "ghi", data_type: Float64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [StringArray
   [
     "count",
     "null_count",
     "mean",
     "std",
     "min",
     "max",
     "median",
   ], PrimitiveArray<Float64>
   [
     3.0,
     0.0,
     4.0,
     3.0,
     1.0,
     7.0,
     4.0,
   ], PrimitiveArray<Float64>
   [
     3.0,
     0.0,
     5.0,
     3.0,
     2.0,
     8.0,
     5.0,
   ], PrimitiveArray<Float64>
   [
     3.0,
     0.0,
     6.0,
     3.0,
     3.0,
     9.0,
     6.0,
   ]], row_count: 7 }]
   ```
   
   With column names `abc,Def,gh.i` I would expect similar. But I actually see:
   
   `cargo run ./desc-bad.csv`
   ```
   Argument ./desc-bad.csv
   Error: Execution("Schema error: No field named def. Valid fields are 
\"?table?\".abc, \"?table?\".\"Def\", \"?table?\".\"gh.i\".")
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to