Rafferty97 opened a new pull request, #20626:
URL: https://github.com/apache/datafusion/pull/20626

   ## Which issue does this PR close?
   
   Closes https://github.com/apache/datafusion/issues/20473
   
   ## Rationale for this change
   
   CSV is a ubiquitous file format, and many are encoded in Windows-1252 and 
other encodings. It would be useful to have the option to read them in 
datafusion.
   
   ## What changes are included in this PR?
   
   * Adds a configuration option to the CSV reader to specify an encoding
   * Adds an optional dependency on `encoding_rs` to do the actual decoding
   * Refactored `CsvSource` somewhat to aid the implementation
   * Removed the return value from `DecoderDeserializer::digest` as it was 
misleading (call sites were ignoring it)
   
   ## Are these changes tested?
   
   I have added one unit test that attempts to read a SHIFT-JIS encoded CSV 
file. More tests are probably needed, but I may need some guidance on this. I'm 
also running into issues getting the test suite to run locally on my Windows 
machine.
   
   ## Are there any user-facing changes?
   
   Adds a new field to `CsvOptions`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to