Anieway commented on PR #36183: URL: https://github.com/apache/arrow/pull/36183#issuecomment-1606015847
@pitrou > While the `min_delta_` might be useful, I don't believe that keeping track of a separate `max_deltas_` array is actually efficient. > > Besides, your PR is buggy and doesn't pass the unit tests (see the ongoing CI results). I timed encoding with this approach on two different platforms. While throughput is increased on one the performance on the other was not affected at all. So keeping track of maximum deltas appears to be at least as efficient as reiterating over every mini-block, judging by ~500 measurements taken on that platform. (Intel Core i5-8365UE CPU (4 physical cores @ 1.6 GHz) 8GB RAM) I am struggling to find failed tests that have to do with the changes contained in this PR. If someone could point me to a test case that shows a defect in this PR's changes I am happy to look into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
