alamb opened a new issue, #7899:
URL: https://github.com/apache/arrow-rs/issues/7899
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
This came up in conversations with @friendlymatthew and @zeroshade today
Given this example
```rust
let mut builder = VariantBuilder::new()
// the sub builder allocates a new buffer
let mut obj = builder.new_object();
obj.insert("a", 1);
// finishes the builder, copies the data into the parent's buider
obj.finish()?;
```
Here is the buffer used by the ObjectBuilder:
https://github.com/apache/arrow-rs/blob/34bb605a0ca5ce7f03de0116023fb2cac6b669b3/parquet-variant/src/builder.rs#L817-L816
Here is where it is copied to the parent builder:
https://github.com/apache/arrow-rs/blob/34bb605a0ca5ce7f03de0116023fb2cac6b669b3/parquet-variant/src/builder.rs#L936-L935
**Describe the solution you'd like**
What I would like to do is avoid the extra allocation to improve performance
**Describe alternatives you've considered**
Here is an approach that must copy the child object bytes but does not use
its own allocation. It is modeled after a description of how the go
implementation works from @zeroshade
1. Change the ObjectBuilder so it remembers where the object should start in
the parent's buffer
2. Remove `ObjectBuffer::buffer` field
3. On append, the ObjectBuilder writes directly into the parent's buffer
4. On `ObjectBuilder::finish` compute how much space is needed for the
offsets, and shift (by copy) the child object bytes down by that amount in the
parent's buffer
5. Fill in the object header + offsets for the child array
6. return
Ideally we would see some performance improvement in the benchmarks
**Additional context**
If this works out, I think we can do a similar optimization for ListBuilder
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]