edandresvan opened a new issue, #6845:
URL: https://github.com/apache/arrow-datafusion/issues/6845

   ### Describe the bug
   
   The function Dataframe.with_column_renamed() cannot change uppercase single 
words to lower case words.
   For instance,
   
   ```rust
   df.with_column_renamed("Package", "package")?; // Produces "Package". Should 
be "package".
   df.with_column_renamed("City Name", "city_name")?; // Produces "city_name". 
OK.
   ```
   
   
   
   ### To Reproduce
   
   I have the following CSV table
   
   |City Name|Package     |Variety    |Date   |Low Price|High Price|Mostly Low|
   |---------|------------|-----------|-------|---------|----------|----------|
   |BALTIMORE|24 inch bins|HOWDEN TYPE|9/24/16|160      |160       |160       |
   |BALTIMORE|24 inch bins|HOWDEN TYPE|9/24/16|160      |160       |160       |
   |BALTIMORE|24 inch bins|HOWDEN TYPE|11/5/16|90       |100       |90        |
   
   I load the CVS data with this code:
   
   ```rust
   let mut config = SessionConfig::new();
    config = config.with_information_schema(true);
   
   let csv_options = CsvReadOptions::new()
     .has_header(true)
     .schema_infer_max_records(2000);
   
   let mut df = context.read_csv("data/pumpkins.csv", 
CsvReadOptions::new()).await?;
   ```
   
   Then, I try to change the column names from uppercase to lower case and 
replacing white spaces with underscores using this code:
   
   ```rust
   let field_names: Vec<String> = df.schema().fields().iter().map(|f| 
f.name().to_string()).collect::<Vec<String>>();
   println!("{:?}", field_names);
   
   let re = Regex::new(r"\s+").unwrap();
   
   for field in field_names {
     let new_field = field.clone().trim().to_lowercase();
     let new_field = re.replace_all(&new_field, " ");
     let new_field: String = new_field.to_lowercase().trim().replace(" ", "_");
   
     println!("{} -> {}", field, new_field);
   
     df = df.with_column_renamed(field, &new_field)?;
   }
   
   df = df.with_column_renamed("Package", "package")?;
   
   let field_names: Vec<String> = df.schema().fields().iter().map(|f| 
f.name().to_string()).collect::<Vec<String>>();
   
   println!("{:?}", field_names);
   ```
   
   And I get this results:
   
   ```text
   ["City Name", "Package", "Variety", "Date", "Low Price", "High Price", 
"Mostly Low"]
   City Name -> city_name
   Package -> package
   Variety -> variety
   Date -> date
   Low Price -> low_price
   High Price -> high_price
   Mostly Low -> mostly_low
   ["city_name", "Package", "Variety", "Date", "low_price", "high_price", 
"mostly_low"]
   ```
   
   As you can see, the function Dataframe.with_column_renamed() does not change 
the columns with an initial uppercase letter, even calling outside the for loop.
   
   ```rust
   df = df.with_column_renamed("Package", "package")?;
   ```
   
   
   ### Expected behavior
   
   I would like the function changes the column name regardless of the case
   
   ```rust
   df = df.with_column_renamed("Package", "package")?;      // produces 
‘package’
   df = df.with_column_renamed("PACKAGE", "package")?; // produces ‘package’
   ```
   
   I had to use the format function and escaping quotation marks in the for 
loop in order to get my desired output from the Dataframe.with_column_renamed() 
function.
   
   ```rust
   // (...)
   for field in field_names {
     //(...)
     df = df.with_column_renamed(format!("\"{}\"", field), &new_field)?;
   }
   // (...)
   ```
   
   The dataframe.with_column_renamed() should change the column name properly. 
In my case, I want his output:
   
   ```text
   ["City Name", "Package", "Variety", "Date", "Low Price", "High Price", 
"Mostly Low"]
   City Name -> city_name
   Package -> package
   Variety -> variety
   Date -> date
   Low Price -> low_price
   High Price -> high_price
   Mostly Low -> mostly_low
   ["city_name", "package", "variety", "date", "low_price", "high_price", 
"mostly_low"]
   ```
   
   |city_name|package     |variety    |date   |low_price|high_price|mostly_low|
   |---------|------------|-----------|-------|---------|----------|----------|
   |BALTIMORE|24 inch bins|HOWDEN TYPE|9/24/16|160      |160       |160       |
   |BALTIMORE|24 inch bins|HOWDEN TYPE|9/24/16|160      |160       |160       |
   |BALTIMORE|24 inch bins|HOWDEN TYPE|11/5/16|90       |100       |90        |
   
   
   I think it would be also useful to add a function to normalize all columns 
names to lower case and underscores
   
   ```rust
   dataframe.to_lowercase() // produces ["city_name", "date", "low_price", 
"high_price"]
   CsvReadOptions::new()::to_lowercase(true) // produces ["city_name", "date", 
"low_price", "high_price"]
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to