tustvold commented on issue #4886:
URL: https://github.com/apache/arrow-rs/issues/4886#issuecomment-1894690494

   > Is this because datafusion wants to be able to process very large files by 
stream-processing the batches
   
   Yes, whilst this is more important for file formats like parquet that 
achieve much higher compression ratios than avro, having streaming iterators is 
pretty standard practice.
   
   > I will notably have a look at 
[serde_arrow](https://lib.rs/crates/serde_arrow) as well for that purpose - I'm 
not sure to what extent that implementation is optimal for this purpose 
currently
   
   You might also be interested in 
https://docs.rs/arrow-json/50.0.0/arrow_json/reader/struct.Decoder.html#method.serialize
   
   > My first glance has me wonder why [the implementation is so 
complex](https://github.com/chmp/serde_arrow/blob/519c6ee4ae74904b17b12616c8400e83ab206faf/serde_arrow/src/arrow_impl/api.rs#L331-L336)
 but then I don't know too much about constructing arrow values
   
   Converting between row-oriented and columnar formats is very fiddly, 
especially where they encode nullability differently :sweat_smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to