Re: [PR] GH-41640: [Go] Implement BYTE_STREAM_SPLIT Parquet Encoding [arrow]

via GitHub Mon, 08 Jul 2024 12:39:46 -0700


joellubi commented on PR #43066:
URL: https://github.com/apache/arrow/pull/43066#issuecomment-2215029656


   @mapleFU @zeroshade I pushed up some changes to the decoders which aligns 
them more closely to the current cpp implementation. I also added a new 
benchmark for batched decoding as well. All benchmarks are updated in the PR 
description.
   
   Overall, the batched approach improves performance slightly across the board 
for decoding. This is most likely because an intermediary buffer is no longer 
needed with this approach, and batches can be directly decoded into the output 
buffer. The new benchmark demonstrates that there's not much of a difference in 
performance between one-batch-per-page and many-batches-per-page decoding. 
There may be bigger differences for extremely small batch sizes but I did my 
best to pick a realistic number. Of course memory usage is less with the 
batched approach. We write directly into the output buffer and don't have to 
allocate pageSize bytes per column reader for decoding all at once.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41640: [Go] Implement BYTE_STREAM_SPLIT Parquet Encoding [arrow]

Reply via email to