edandresvan opened a new issue, #6845:
URL: https://github.com/apache/arrow-datafusion/issues/6845
### Describe the bug
The function Dataframe.with_column_renamed() cannot change uppercase single
words to lower case words.
For instance,
```rust
df.with_column_renamed("Package", "package")?; // Produces "Package". Should
be "package".
df.with_column_renamed("City Name", "city_name")?; // Produces "city_name".
OK.
```
### To Reproduce
I have the following CSV table
|City Name|Package |Variety |Date |Low Price|High Price|Mostly Low|
|---------|------------|-----------|-------|---------|----------|----------|
|BALTIMORE|24 inch bins|HOWDEN TYPE|9/24/16|160 |160 |160 |
|BALTIMORE|24 inch bins|HOWDEN TYPE|9/24/16|160 |160 |160 |
|BALTIMORE|24 inch bins|HOWDEN TYPE|11/5/16|90 |100 |90 |
I load the CVS data with this code:
```rust
let mut config = SessionConfig::new();
config = config.with_information_schema(true);
let csv_options = CsvReadOptions::new()
.has_header(true)
.schema_infer_max_records(2000);
let mut df = context.read_csv("data/pumpkins.csv",
CsvReadOptions::new()).await?;
```
Then, I try to change the column names from uppercase to lower case and
replacing white spaces with underscores using this code:
```rust
let field_names: Vec<String> = df.schema().fields().iter().map(|f|
f.name().to_string()).collect::<Vec<String>>();
println!("{:?}", field_names);
let re = Regex::new(r"\s+").unwrap();
for field in field_names {
let new_field = field.clone().trim().to_lowercase();
let new_field = re.replace_all(&new_field, " ");
let new_field: String = new_field.to_lowercase().trim().replace(" ", "_");
println!("{} -> {}", field, new_field);
df = df.with_column_renamed(field, &new_field)?;
}
df = df.with_column_renamed("Package", "package")?;
let field_names: Vec<String> = df.schema().fields().iter().map(|f|
f.name().to_string()).collect::<Vec<String>>();
println!("{:?}", field_names);
```
And I get this results:
```text
["City Name", "Package", "Variety", "Date", "Low Price", "High Price",
"Mostly Low"]
City Name -> city_name
Package -> package
Variety -> variety
Date -> date
Low Price -> low_price
High Price -> high_price
Mostly Low -> mostly_low
["city_name", "Package", "Variety", "Date", "low_price", "high_price",
"mostly_low"]
```
As you can see, the function Dataframe.with_column_renamed() does not change
the columns with an initial uppercase letter, even calling outside the for loop.
```rust
df = df.with_column_renamed("Package", "package")?;
```
### Expected behavior
I would like the function changes the column name regardless of the case
```rust
df = df.with_column_renamed("Package", "package")?; // produces
‘package’
df = df.with_column_renamed("PACKAGE", "package")?; // produces ‘package’
```
I had to use the format function and escaping quotation marks in the for
loop in order to get my desired output from the Dataframe.with_column_renamed()
function.
```rust
// (...)
for field in field_names {
//(...)
df = df.with_column_renamed(format!("\"{}\"", field), &new_field)?;
}
// (...)
```
The dataframe.with_column_renamed() should change the column name properly.
In my case, I want his output:
```text
["City Name", "Package", "Variety", "Date", "Low Price", "High Price",
"Mostly Low"]
City Name -> city_name
Package -> package
Variety -> variety
Date -> date
Low Price -> low_price
High Price -> high_price
Mostly Low -> mostly_low
["city_name", "package", "variety", "date", "low_price", "high_price",
"mostly_low"]
```
|city_name|package |variety |date |low_price|high_price|mostly_low|
|---------|------------|-----------|-------|---------|----------|----------|
|BALTIMORE|24 inch bins|HOWDEN TYPE|9/24/16|160 |160 |160 |
|BALTIMORE|24 inch bins|HOWDEN TYPE|9/24/16|160 |160 |160 |
|BALTIMORE|24 inch bins|HOWDEN TYPE|11/5/16|90 |100 |90 |
I think it would be also useful to add a function to normalize all columns
names to lower case and underscores
```rust
dataframe.to_lowercase() // produces ["city_name", "date", "low_price",
"high_price"]
CsvReadOptions::new()::to_lowercase(true) // produces ["city_name", "date",
"low_price", "high_price"]
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]