johnkerl opened a new issue, #16017:
URL: https://github.com/apache/datafusion/issues/16017
### Describe the bug
Doing `describe` with any upper-case or dot characters in column names
results in
```
Error: Execution("Schema error: No field named ...")
```
### To Reproduce
`./Cargo.toml`
```
[package]
name = "describe-with-case"
version = "0.1.0"
edition = "2024"
[dependencies]
"datafusion" = "47"
tokio = { version = "1.45", features = ["rt", "rt-multi-thread", "macros"] }
```
`src/main.rs`
```
use std::env;
use datafusion::error::Result;
use datafusion::execution::context::SessionContext;
use datafusion::prelude::CsvReadOptions;
#[tokio::main]
async fn main() -> Result<()> {
let args: Vec<String> = env::args().collect();
let ctx = SessionContext::new();
for arg in args.iter().skip(1) {
println!("");
println!("Filename: {arg}");
let df = ctx.read_csv(arg, CsvReadOptions::new()).await?;
let stat = df.describe().await?.collect().await?;
println!("{stat:?}");
}
Ok(())
}
```
`./desc-good.csv`
```
abc,def,ghi
1,2,3
4,5,6
7,8,9
```
`./desc-bad.csv`
```
abc,Def,gh.i
1,2,3
4,5,6
7,8,9
```
### Expected behavior
With column names `abc,def,ghi` we see
`cargo run ./desc-good.csv`
```
Argument ./desc-good.csv
[RecordBatch { schema: Schema { fields: [Field { name: "describe",
data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata:
{} }, Field { name: "abc", data_type: Float64, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "def", data_type:
Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "ghi", data_type: Float64, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [StringArray
[
"count",
"null_count",
"mean",
"std",
"min",
"max",
"median",
], PrimitiveArray<Float64>
[
3.0,
0.0,
4.0,
3.0,
1.0,
7.0,
4.0,
], PrimitiveArray<Float64>
[
3.0,
0.0,
5.0,
3.0,
2.0,
8.0,
5.0,
], PrimitiveArray<Float64>
[
3.0,
0.0,
6.0,
3.0,
3.0,
9.0,
6.0,
]], row_count: 7 }]
```
With column names `abc,Def,gh.i` I would expect similar. But I actually see:
`cargo run ./desc-bad.csv`
```
Argument ./desc-bad.csv
Error: Execution("Schema error: No field named def. Valid fields are
\"?table?\".abc, \"?table?\".\"Def\", \"?table?\".\"gh.i\".")
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]