PinkCrow007 commented on issue #46555: URL: https://github.com/apache/arrow/issues/46555#issuecomment-2929299578
I really like the single-buffer builder design @zeroshade . It’s impressively efficient, avoids extra allocations, and makes lifecycle management a lot simpler. I’ve been working on a nested builder in [Rust](https://github.com/apache/arrow-rs/issues/7424), and honestly, it's significantly less efficient than this approach. It uses intermediate buffers and multiple copies, which hurts performance. The main benefit, though, is API ergonomics — nested builders feel more intuitive to users. (Still not entirely user-friendly: due to the borrow checker, users must .finish() one nested object before starting another) Regarding key sorting: I believe the problem stems from Variant format itself. Since field_ids are determined by the sorted dictionary in metadata, we either need to sort all keys upfront or patch field_ids after the fact. In my current implementation, I follow the latter approach - when sorted_strings is set to true, the builder walks back and patches every field headers during the finalization of the root builder, but this isn’t very efficient or safe. I’m still looking for better ways to handle this. Overall, I think this design space presents a tradeoff: Single builders optimize for efficiency and buffer reuse, but expose a lower-level API; Nested builders are easier for users, but are harder to make efficient. It’s been really helpful to see this Go implementation and the surrounding discussion. I agree that establishing best practices here will be crucial as more language implementations emerge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org