Rafferty97 opened a new issue, #9465: URL: https://github.com/apache/arrow-rs/issues/9465
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** It seems that CSV files encoded with non-UTF-8 charsets, such as Windows-1252, are annoyingly common in the wild. It would be useful to be able to consume them directly via an additional configuration option. **Describe the solution you'd like** Add a configuration option to the CSV reader to specify a character encoding, defaulting to UTF-8. The implementation can make us of `encoding_rs`, and could be feature-gated so as to not affect users who don't need this functionality. **Describe alternatives you've considered** The only alternative I can think of is to decode the entire CSV file up front before reading it via Apache Arrow, but this is suboptimal for a lot of usecases. **Additional context** I originally opened a similar issue in the Datafusion project, but after further reflection, figured it was possibly better implemented in Arrow itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
