zeroshade commented on PR #43066: URL: https://github.com/apache/arrow/pull/43066#issuecomment-2214364440
I think we should at least try @mapleFU's idea to follow the C++ impl and decode via strides incrementally if possible and then compare the benchmarks (possibly make it a config option?) Ultimately the trade-off here is the current impl requires extra memory to fully decode the entire page when calling `SetData` to make partial decodes faster, or less memory but partial decodes are more expensive since we'd be jumping around to decode values per stride. I'd be curious what the effect would be on performance in two scenarios: 1. multiple reads of a smaller number to read a whole page 2. reading a whole page + part of the next page Is it worthwhile implementing it? Or should we look at it as a follow-up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
