Rafferty97 opened a new issue, #20473: URL: https://github.com/apache/datafusion/issues/20473
### Is your feature request related to a problem or challenge? Currently, Datafusion doesn't appear to support reading CSV files that use a non-UTF-8 encoding scheme, such as the common ISO-8859-1 or others. While CSV may be a terrible data format, it's also ubiquitous in the wild and many of them use alternative character encodings. It would be useful if there was an option to read CSV files that use an encoding other than UTF-8. ### Describe the solution you'd like Add an option to `CsvOptions` or elsewhere to specify the encoding used by the input file, defaulting to `UTF-8`. Datafusion could then use `encoding_rs` internally to decode chunks of incoming data. ### Describe alternatives you've considered An alternative to depending on `encoding_rs` directly would be to expose an option that allowed users to provide their own decoding logic, which they would then likely delegate to `encoding_rs`. This might be desirable if the added dependency is deemed to heavy (though it could easily be put behind a feature flag). ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
