nevi-me edited a comment on issue #1474:
URL: https://github.com/apache/arrow-rs/issues/1474#issuecomment-1076549007


   If possible, we could use Arrow's buffer based on the `arrow` feature, then 
use some abstraction (I'd be fine with `bytes`) for the other cases. The perf 
cliff is whenever we create multiple small `ByteBuffer` instances (e.g. 
representing vec!["hello", "there"]` as 2 instances instead of a single 
`ByteBuffer` with offsets into the 2 values. I think having a single buffer per 
page/row group would be helpful.
   
   The upside of using Arrow's buffer is minimising/eliminating data copies. I 
was able to improve the Arrow side here 
(https://github.com/apache/arrow-rs/pull/820), and see @alamb's comment 
(https://github.com/apache/arrow-rs/pull/820#discussion_r724365907).
   
   > I have a long-term hope to eventually phase out MutableBuffer and replace 
it with a typed construction that is easier to use without unsafe. Something 
with a similar interface to the ScalarBuffer I added to parquet might be a 
candidate
   
   This would be great, as it seems that a lot of the safety (and some perf) 
issues lie there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to