joellubi commented on PR #43066: URL: https://github.com/apache/arrow/pull/43066#issuecomment-2215029656
@mapleFU @zeroshade I pushed up some changes to the decoders which aligns them more closely to the current cpp implementation. I also added a new benchmark for batched decoding as well. All benchmarks are updated in the PR description. Overall, the batched approach improves performance slightly across the board for decoding. This is most likely because an intermediary buffer is no longer needed with this approach, and batches can be directly decoded into the output buffer. The new benchmark demonstrates that there's not much of a difference in performance between one-batch-per-page and many-batches-per-page decoding. There may be bigger differences for extremely small batch sizes but I did my best to pick a realistic number. Of course memory usage is less with the batched approach. We write directly into the output buffer and don't have to allocate pageSize bytes per column reader for decoding all at once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
