brunal commented on issue #9465:
URL: https://github.com/apache/arrow-rs/issues/9465#issuecomment-3976683283

   I think this can be achieved without any arrow change, using `encoding_rs_*` 
streaming encoders/decoders.
   ````
   // Basic
   fn read_utf8(filename: &str, schema: SchemaRef) -> RecordBatch {
       let file = File::open(filename).unwrap();
       let mut csv = ReaderBuilder::new(Arc::new(schema)).build(file).unwrap();
       let batch = csv.next().unwrap().unwrap();
       batch
   }
   
   // With a different encoding
   fn read_other_encoding(filename: &str, schema: SchemaRef, encoding: 
&encoding_rs::Encoding) -> RecordBatch {
       let file = File::open(filename).unwrap();
       let mut decoded = encoding_rs_rw::DecodingReader(file, 
encoding.new_decoder());
       let mut csv = 
ReaderBuilder::new(Arc::new(schema)).build(decoded).unwrap();
       let batch = csv.next().unwrap().unwrap();
       batch
   }
   ```
   
   The change is minimal and doesn't require upfront decoding of the whole 
file. I believe that's a better solution than adding a crate feature (I don't 
like crate features) & another dependency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to