[I] Support CSV files encoded with charsets other than UTF-8 [arrow-rs]

via GitHub Mon, 23 Feb 2026 03:23:27 -0800


Rafferty97 opened a new issue, #9465:
URL: https://github.com/apache/arrow-rs/issues/9465


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   It seems that CSV files encoded with non-UTF-8 charsets, such as 
Windows-1252, are annoyingly common in the wild. It would be useful to be able 
to consume them directly via an additional configuration option.
   
   **Describe the solution you'd like**
   Add a configuration option to the CSV reader to specify a character 
encoding, defaulting to UTF-8. The implementation can make us of `encoding_rs`, 
and could be feature-gated so as to not affect users who don't need this 
functionality.
   
   **Describe alternatives you've considered**
   The only alternative I can think of is to decode the entire CSV file up 
front before reading it via Apache Arrow, but this is suboptimal for a lot of 
usecases.
   
   **Additional context**
   I originally opened a similar issue in the Datafusion project, but after 
further reflection, figured it was possibly better implemented in Arrow itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Support CSV files encoded with charsets other than UTF-8 [arrow-rs]

Reply via email to