scovich commented on PR #8031:
URL: https://github.com/apache/arrow-rs/pull/8031#issuecomment-3145302958

   > Why is `PackedU32Iterator` + `extend` the slowest? Is it because it calls 
multiple `extend`, which results in data movement? (using `PackedU32Iterator` 
allows `extend` to take advantage of `size_hint`).
   
   I suspect it's because the compiler can't "see through" the packed iterator 
like it can with the append+truncate approach. So the former produces 1-4 
individual byte appends, each with its own bounds check and length change, 
while the latter produces a single 32-bit append followed by a truncate.
   
   > For the unsafe code with `write_u32_at_pos`, I tried it, but it requires 
further changes; otherwise, it will panic in `write_u32_at_pos` because there 
is no more room.
   
   If you hit a buffer overflow panic, I think that means you tried to use the 
"fast" version to write the `Some(data_size)` entry, after the loop exits. Or 
did you hit a problem in the loop itself?
   
   > I think it might be better to stick with the current solution: 1) The 
solution in the current PR is more complex than the original, but the 
performance improvement isn't significant.
   
   The same argument likely applies to the object builder. Should we consider 
reverting that back to match the list builder, for simplicity and 
maintainability? I'd have to go back and check the other PR, but was the perf 
improvement actually significant there? 
   
   If so=> why did it not work for list builder? Do we not have 
apples-to-apples benchmarking? 
   
   If not => probably good to revert. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to