vertexclique edited a comment on pull request #8598: URL: https://github.com/apache/arrow/pull/8598#issuecomment-723261667
> So my feedback here is that it is not clear to me what this PR is trying to accomplish (aka answer the question of why make the changes in this pr) and thus it is not clear how to review / evaluate it. Here comes the explanation @alamb and team: > If the PR's aim is to fix bug, I would expect some explanation / demonstration / of something that fails without the changes in the PR and passes with changes in the PR. The bugs this PR's changes fixes are probably obvious to you, but sadly they are not to me :( 1. Bugfix: Other pr (#8571) fixes one offset problem. This pr fixes it intrinsically, tests are added for that in commit: d6e4744 . Moreover, data fusion tests are not using machine epsilon (small read: https://floating-point-gui.de/errors/comparison/), and I have just implemented an assertion method to be used with data fusion tests in c108026 . That was yet another bug. Now a small note: I didn't add these tests initially to not to be understood as rude against @jhorstmann and the pr opened there, but as you can see I have committed the exact tests in d6e4744 with the co-authoring feature to make it cumulative effort. 2. Extensibility: Now iterators can be extended with different iterator types. Now, if you want, bit view can dispense not exact size chunk iterator or bit by bit iterator. Whatever you like. Just adding the wrapping iterator for cases makes it easier. > If the PR's aim is to add support for endianness, I would expect some demonstration that the new code can do something that the old code can't (aka tests) 3. Architecture support: Now it can compile and run on big-endian architectures. Still, we have work to do there but we will get there eventually. For big-endian tests are written in c7428fb . Moreover, I think we should write more generic implementations, like how we are doing over the last 4 months, and still support platforms that we have promised. Personally, I don't want to write too much architecture-specific code in Rust to make it work over the upcoming months, and I can also advocate for that for the members of the Arrow Rust team. In the C++ version, I saw these and you can infer from how it is hard to support multiple platforms: https://github.com/apache/arrow/pull/7507/files#diff-c3b0484ad8586ff46fa035d446a7d1c3a30cd35d13cd05678c99814938e07d5bR78-R214 Here you can see mips (be) test results: ``` Compiling arrow v3.0.0-SNAPSHOT (/project) Finished test [unoptimized + debuginfo] target(s) in 5.64s Running /target/mips-unknown-linux-gnu/debug/deps/arrow-ba04cf069343d58e running 6 tests test util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_aligned ... ok test util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_reinterpret ... ok test util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned ... ok test util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_1_byte ... ok test util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_bits_across_bytes ... ok test util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_bits_large ... ok ``` Here you can see armv7 (le) test results: ``` Finished test [unoptimized + debuginfo] target(s) in 0.11s Running /target/armv7-unknown-linux-gnueabihf/debug/deps/arrow-6ffb743de7744875 running 6 tests test util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_aligned ... ok test util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_reinterpret ... ok test util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned ... ok test util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_1_byte ... ok test util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_bits_across_bytes ... ok test util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_bits_large ... ok ``` 4. Preventing bugs: As you can see the C++ implementation's sophisticated code, it is really easy to make mistakes in this field, while doing bit shaking, bit twiddling etc. You might carry one bit right but you don't consider the carry and it works for a long time until we realize that it is not working anymore. So abstracting some things from the development is always good from my point of view. And I find this pragmatically correct for this case. > If the PR's aim is to make the code easier to understand, I would expect some description of why the new code is easier to understand than the old (which will be a subjective judgement, for sure). 4. Flexibility: You can see byte reinterpretation as I've mentioned/promised before in the tests contained in commit c7428fb . Moreover, you can see that the new implementation without comments is only 100 lines exact. Also, views, buffers, iterators, bit sequence interpretation is completely separate. Obviously, as you said, that is subjective to the reader. I find the separation better atm. > Since this PR seems to have elements of all three goals, but is light on the explanation, I am struggling to evaluate it concisely I hope I have answered all your questions. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
