Ten0 commented on issue #4886:
URL: https://github.com/apache/arrow-rs/issues/4886#issuecomment-1922019175

   > I'll try to get what I have polished up over the next few days, and we can 
compare benchmarks.
   
   Here's a quick POC for for full-featured Avro to Arrow using 
`serde_avro_fast`, `serde_arrow` and `serde_transcode`:
   
https://github.com/Ten0/arrow_serde_avro/blob/0ea1292064f877b210211c09d001e7b7db02fbdf/tests/simple.rs#L60-L61
   
https://github.com/Ten0/arrow_serde_avro/blob/0ea1292064f877b210211c09d001e7b7db02fbdf/src/lib.rs#L8
   
   It holds in <150 lines total ATM and successfully loads avro object 
container files to arrow `RecordBatch`. ([Schema 
conversion](https://github.com/Ten0/arrow_serde_avro/blob/0ea1292064f877b210211c09d001e7b7db02fbdf/src/schema_conversion.rs#L8)
 is pretty basic ATM but straightforward to add more)
   
   Performance of serde_arrow should be very close to zero-cost abstraction 
since https://github.com/chmp/serde_arrow/pull/120.
   There's just 
https://github.com/chmp/serde_arrow/pull/120#discussion_r1468664717, 
https://github.com/chmp/serde_arrow/pull/120#discussion_r1468667388 and 
https://github.com/chmp/serde_arrow/issues/92#issuecomment-1895467586 that are 
clear areas of potential performance improvements for this particular 
integration, but that's a reasonably simple fix.
   
   I'll probably PR that before benchmarks (if @chmp hasn't done it before 🚀)
   
   > You might also be interested in 
[arrow_json::Decoder::serialize](https://docs.rs/arrow-json/50.0.0/arrow_json/reader/struct.Decoder.html#method.serialize)
   
   It adds significant intermediate representations in the "tape" thing. It 
seems pretty clear that is indeed why it's so far behind [in the 
benchmarks](https://github.com/chmp/serde_arrow/blob/519c6ee4ae74904b17b12616c8400e83ab206faf/Readme.md).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to