scovich commented on PR #8031: URL: https://github.com/apache/arrow-rs/pull/8031#issuecomment-3141439602
> I've tried to benchmark three implementations Thanks for taking the time to do this! > The results show that the first win(but not too much). Technically, the first two are equivalent because each result's mean is comfortably inside the other's confidence window. While it may be suggestive that the temp buf approach has a higher mean in all four results, any real difference is dwarfed by noise: group | PackedU32Iterator | append into tmp buf -|-|- json_list | 28.4±1.05ms | 28.5±1.09ms random_json | 324.6±8.35ms | 328.9±8.20m repeated_struct | 11.0±0.40ms | 11.3±0.31ms variant_get_primitive | 1105.8±33.81µs | 1106.9±40.65µ (aside: why would `variant_get_primitive` case (= reading) even be relevant for variant _building_?) Meanwhile, it's unsurprising in retrospect that the complex iterator approach would be slower than the append+truncate approach, given how simple (branchless!) the latter's machine code is. >for the `mis-calculated header` problem, can we use ut to cover the header size calculation logic? I mean, yes we can. But given a choice to eliminate a bug surface entirely from the code base, vs. try to validate it with unit tests that could be incomplete and/or go out of sync? I almost always favor eliminating the possibility of bugs over testing for them -- unless the perf and/or complexity difference is large enough to make me question that default choice. > do we need to bench for different header metadata such as `is_large` is true? If we care about the performance (and correctness) of that case, it seems like we need to have benchmarks and unit tests to cover it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org