nevi-me edited a comment on issue #1474: URL: https://github.com/apache/arrow-rs/issues/1474#issuecomment-1076549007
If possible, we could use Arrow's buffer based on the `arrow` feature, then use some abstraction (I'd be fine with `bytes`) for the other cases. The perf cliff is whenever we create multiple small `ByteBuffer` instances (e.g. representing vec!["hello", "there"]` as 2 instances instead of a single `ByteBuffer` with offsets into the 2 values. I think having a single buffer per page/row group would be helpful. The upside of using Arrow's buffer is minimising/eliminating data copies. I was able to improve the Arrow side here (https://github.com/apache/arrow-rs/pull/820), and see @alamb's comment (https://github.com/apache/arrow-rs/pull/820#discussion_r724365907). > I have a long-term hope to eventually phase out MutableBuffer and replace it with a typed construction that is easier to use without unsafe. Something with a similar interface to the ScalarBuffer I added to parquet might be a candidate This would be great, as it seems that a lot of the safety (and some perf) issues lie there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
