Re: [PR] [Variant] Optimize the object header generation logic in ObjectBuilder::finish [arrow-rs]

via GitHub Thu, 31 Jul 2025 14:51:20 -0700


scovich commented on PR #8031:
URL: https://github.com/apache/arrow-rs/pull/8031#issuecomment-3141439602


   > I've tried to benchmark three implementations
   
   Thanks for taking the time to do this!
   
   > The results show that the first win(but not too much).
   
   Technically, the first two are equivalent because each result's mean is 
comfortably inside the other's confidence window. While it may be suggestive 
that the temp buf approach has a higher mean in all four results, any real 
difference is dwarfed by noise:
   
   group | PackedU32Iterator | append into tmp buf
   -|-|-
   json_list | 28.4±1.05ms | 28.5±1.09ms
   random_json | 324.6±8.35ms | 328.9±8.20m
   repeated_struct | 11.0±0.40ms | 11.3±0.31ms
   variant_get_primitive | 1105.8±33.81µs | 1106.9±40.65µ
   
   (aside: why would `variant_get_primitive` case (= reading) even be relevant 
for variant _building_?)
   
   Meanwhile, it's unsurprising in retrospect that the complex iterator 
approach would be slower than the append+truncate approach, given how simple 
(branchless!) the latter's machine code is. 
   
   >for the `mis-calculated header` problem, can we use ut to cover the header 
size calculation logic?
   
   I mean, yes we can. But given a choice to eliminate a bug surface entirely 
from the code base, vs. try to validate it with unit tests that could be 
incomplete and/or go out of sync? I almost always favor eliminating the 
possibility of bugs over testing for them -- unless the perf and/or complexity 
difference is large enough to make me question that default choice.
   
   > do we need to bench for different header metadata such as `is_large` is 
true?
   
   If we care about the performance (and correctness) of that case, it seems 
like we need to have benchmarks and unit tests to cover it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [Variant] Optimize the object header generation logic in ObjectBuilder::finish [arrow-rs]

Reply via email to