tustvold commented on issue #4886: URL: https://github.com/apache/arrow-rs/issues/4886#issuecomment-1894690494
> Is this because datafusion wants to be able to process very large files by stream-processing the batches Yes, whilst this is more important for file formats like parquet that achieve much higher compression ratios than avro, having streaming iterators is pretty standard practice. > I will notably have a look at [serde_arrow](https://lib.rs/crates/serde_arrow) as well for that purpose - I'm not sure to what extent that implementation is optimal for this purpose currently You might also be interested in https://docs.rs/arrow-json/50.0.0/arrow_json/reader/struct.Decoder.html#method.serialize > My first glance has me wonder why [the implementation is so complex](https://github.com/chmp/serde_arrow/blob/519c6ee4ae74904b17b12616c8400e83ab206faf/serde_arrow/src/arrow_impl/api.rs#L331-L336) but then I don't know too much about constructing arrow values Converting between row-oriented and columnar formats is very fiddly, especially where they encode nullability differently :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
