scovich commented on PR #8031: URL: https://github.com/apache/arrow-rs/pull/8031#issuecomment-3145302958
> Why is `PackedU32Iterator` + `extend` the slowest? Is it because it calls multiple `extend`, which results in data movement? (using `PackedU32Iterator` allows `extend` to take advantage of `size_hint`). I suspect it's because the compiler can't "see through" the packed iterator like it can with the append+truncate approach. So the former produces 1-4 individual byte appends, each with its own bounds check and length change, while the latter produces a single 32-bit append followed by a truncate. > For the unsafe code with `write_u32_at_pos`, I tried it, but it requires further changes; otherwise, it will panic in `write_u32_at_pos` because there is no more room. If you hit a buffer overflow panic, I think that means you tried to use the "fast" version to write the `Some(data_size)` entry, after the loop exits. Or did you hit a problem in the loop itself? > I think it might be better to stick with the current solution: 1) The solution in the current PR is more complex than the original, but the performance improvement isn't significant. The same argument likely applies to the object builder. Should we consider reverting that back to match the list builder, for simplicity and maintainability? I'd have to go back and check the other PR, but was the perf improvement actually significant there? If so=> why did it not work for list builder? Do we not have apples-to-apples benchmarking? If not => probably good to revert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org