Re: [PR] Support non-UTF-8 encoded CSV files [datafusion]

via GitHub Wed, 24 Jun 2026 17:58:16 -0700


Rafferty97 commented on PR #20626:
URL: https://github.com/apache/datafusion/pull/20626#issuecomment-4794901652


   Hi @alamb, I've had a go at implementing your suggestion of a 
`PreReadDecoder` API, but I'm struggling to figure out the best place to put it.
   
   This feels like a feature that should be agnostic to both where the data 
comes from (filesystem, S3, whatever) and what file format consumes that data, 
but it seems that implementors of `FileOpener` interact with the object store 
abstraction directly, leaving nowhere to insert this new step in the pipeline.
   
   I guess the most expediant path would be to implement it into each file 
format separately, in a similar manner to how compression currently works, but 
I don't think that's the best long-term solution.
   
   Keen to hear your thoughts if you had any ideas :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Support non-UTF-8 encoded CSV files [datafusion]

Reply via email to