johnkerl opened a new issue, #16017:
URL: https://github.com/apache/datafusion/issues/16017

   ### Describe the bug
   
   Doing `describe` with any upper-case or dot characters in column names 
results in
   ```
   Error: Execution("Schema error: No field named ...")
   ```
   
   ### To Reproduce
   
   `./Cargo.toml`
   ```
   [package]
   name = "describe-with-case"
   version = "0.1.0"
   edition = "2024"
   
   [dependencies]
   "datafusion" = "47"
   tokio = { version = "1.45", features = ["rt", "rt-multi-thread", "macros"] }
   ```
   
   `src/main.rs`
   ```
   use std::env;
   use datafusion::error::Result;
   use datafusion::execution::context::SessionContext;
   use datafusion::prelude::CsvReadOptions;
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let args: Vec<String> = env::args().collect();
       let ctx = SessionContext::new();
       for arg in args.iter().skip(1) {
           println!("");
           println!("Filename: {arg}");
   
           let df = ctx.read_csv(arg, CsvReadOptions::new()).await?;
           let stat = df.describe().await?.collect().await?;
           println!("{stat:?}");
       }
   
       Ok(())
   }
   ```
   
   `./desc-good.csv`
   ```
   abc,def,ghi
   1,2,3
   4,5,6
   7,8,9
   ```
   
   `./desc-bad.csv`
   ```
   abc,Def,gh.i
   1,2,3
   4,5,6
   7,8,9
   ```
   
   ### Expected behavior
   
   With column names `abc,def,ghi` we see
   
   `cargo run ./desc-good.csv`
   ```
   Argument ./desc-good.csv
   [RecordBatch { schema: Schema { fields: [Field { name: "describe", 
data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: 
{} }, Field { name: "abc", data_type: Float64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: "def", data_type: 
Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 
Field { name: "ghi", data_type: Float64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [StringArray
   [
     "count",
     "null_count",
     "mean",
     "std",
     "min",
     "max",
     "median",
   ], PrimitiveArray<Float64>
   [
     3.0,
     0.0,
     4.0,
     3.0,
     1.0,
     7.0,
     4.0,
   ], PrimitiveArray<Float64>
   [
     3.0,
     0.0,
     5.0,
     3.0,
     2.0,
     8.0,
     5.0,
   ], PrimitiveArray<Float64>
   [
     3.0,
     0.0,
     6.0,
     3.0,
     3.0,
     9.0,
     6.0,
   ]], row_count: 7 }]
   ```
   
   With column names `abc,Def,gh.i` I would expect similar. But I actually see:
   
   `cargo run ./desc-bad.csv`
   ```
   Argument ./desc-bad.csv
   Error: Execution("Schema error: No field named def. Valid fields are 
\"?table?\".abc, \"?table?\".\"Def\", \"?table?\".\"gh.i\".")
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to