Re: [PR] GH-41640: [Go] Implement BYTE_STREAM_SPLIT Parquet Encoding [arrow]

via GitHub Mon, 08 Jul 2024 08:02:47 -0700


zeroshade commented on PR #43066:
URL: https://github.com/apache/arrow/pull/43066#issuecomment-2214364440


   I think we should at least try @mapleFU's idea to follow the C++ impl and 
decode via strides incrementally if possible and then compare the benchmarks 
(possibly make it a config option?)
   
   Ultimately the trade-off here is the current impl requires extra memory to 
fully decode the entire page when calling `SetData` to make partial decodes 
faster, or less memory but partial decodes are more expensive since we'd be 
jumping around to decode values per stride.
   
   I'd be curious what the effect would be on performance in two scenarios:
   
   1. multiple reads of a smaller number to read a whole page
   2. reading a whole page + part of the next page
   
   Is it worthwhile implementing it? Or should we look at it as a follow-up?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41640: [Go] Implement BYTE_STREAM_SPLIT Parquet Encoding [arrow]

Reply via email to