vertexclique commented on pull request #8598:
URL: https://github.com/apache/arrow/pull/8598#issuecomment-723261667


   > So my feedback here is that it is not clear to me what this PR is trying 
to accomplish (aka answer the question of why make the changes in this pr) and 
thus it is not clear how to review / evaluate it.
   
   Here comes the explanation @alamb and team:
   > If the PR's aim is to fix bug, I would expect some explanation / 
demonstration / of something that fails without the changes in the PR and 
passes with changes in the PR. The bugs this PR's changes fixes are probably 
obvious to you, but sadly they are not to me :(
   
   1. Bugfix: Other pr (#8571) fixes one offset problem. This pr fixes it 
intrinsically, tests are added for that in commit: d6e4744 . Moreover, data 
fusion tests are not using machine epsilon (small read: 
https://floating-point-gui.de/errors/comparison/), and I have just implemented 
an assertion method to be used with data fusion tests in c108026 . That was yet 
another bug. Now a small note: I didn't add these tests initially to not to be 
understood as rude against @jhorstmann and the pr opened there, but as you can 
see I have committed the exact tests in d6e4744 with the co-authoring feature 
to make it cumulative effort.
   
   2. Extensibility: Now iterators can be extended with different iterator 
types. Now, if you want, bit view can dispense not exact size chunk iterator or 
bit by bit iterator. Whatever you like. Just adding the wrapping iterator for 
cases makes it easier.
   
   > If the PR's aim is to add support for endianness, I would expect some 
demonstration that the new code can do something that the old code can't (aka 
tests)
   
   3. Architecture support: Now it can compile and run on big-endian 
architectures. Still, we have work to do there but we will get there 
eventually. For big-endian tests are written in c7428fb . Moreover, I think we 
should write more generic implementations, like how we are doing over the last 
4 months, and still support platforms that we have promised. Personally, I 
don't want to write too much architecture-specific code in Rust to make it work 
over the upcoming months, and I can also advocate for that for the members of 
the Arrow Rust team. In the C++ version, I saw these and you can infer from how 
it is hard to support multiple platforms: 
https://github.com/apache/arrow/pull/7507/files#diff-c3b0484ad8586ff46fa035d446a7d1c3a30cd35d13cd05678c99814938e07d5bR78-R214
   
   4. Preventing bugs: As you can see the C++ implementation's sophisticated 
code, it is really easy to make mistakes in this field, while doing bit 
shaking, bit twiddling etc. You might carry one bit right but you don't 
consider the carry and it works for a long time until we realize that it is not 
working anymore. So abstracting some things from the development is always good 
from my point of view. And I find this pragmatically correct for this case.
   
   Here you can see mips (be) test results:
   ```
      Compiling arrow v3.0.0-SNAPSHOT (/project)
       Finished test [unoptimized + debuginfo] target(s) in 5.64s
        Running /target/mips-unknown-linux-gnu/debug/deps/arrow-ba04cf069343d58e
   
   running 6 tests
   test 
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_aligned
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_reinterpret
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_1_byte
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_bits_across_bytes
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_bits_large
 ... ok
   ```
   
   Here you can see armv7 (le) test results:
   ```
       Finished test [unoptimized + debuginfo] target(s) in 0.11s
        Running 
/target/armv7-unknown-linux-gnueabihf/debug/deps/arrow-6ffb743de7744875
   
   running 6 tests
   test 
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_aligned
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_reinterpret
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_1_byte
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_bits_across_bytes
 ... ok
   test 
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_bits_large
 ... ok
   ```
   
   > If the PR's aim is to make the code easier to understand, I would expect 
some description of why the new code is easier to understand than the old 
(which will be a subjective judgement, for sure).
   
   4. Flexibility: You can see byte reinterpretation as I've mentioned/promised 
before in the tests contained in commit c7428fb . Moreover, you can see that 
the new implementation without comments is only 100 lines exact. Also, views, 
buffers, iterators, bit sequence interpretation is completely separate. 
Obviously, as you said, that is subjective to the reader. I find the separation 
better atm.
   
   > Since this PR seems to have elements of all three goals, but is light on 
the explanation, I am struggling to evaluate it concisely
   
   I hope I have answered all your questions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to