[GitHub] [arrow] vertexclique commented on pull request #8598: ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers

GitBox Fri, 06 Nov 2020 11:35:46 -0800


vertexclique commented on pull request #8598:
URL: https://github.com/apache/arrow/pull/8598#issuecomment-723261667

> So my feedback here is that it is not clear to me what this PR is trying
to accomplish (aka answer the question of why make the changes in this pr) and
thus it is not clear how to review / evaluate it.

Here comes the explanation @alamb and team:
> If the PR's aim is to fix bug, I would expect some explanation /
demonstration / of something that fails without the changes in the PR and
passes with changes in the PR. The bugs this PR's changes fixes are probably
obvious to you, but sadly they are not to me :(

1. Bugfix: Other pr (#8571) fixes one offset problem. This pr fixes it
intrinsically, tests are added for that in commit: d6e4744 . Moreover, data
fusion tests are not using machine epsilon (small read:
https://floating-point-gui.de/errors/comparison/), and I have just implemented
an assertion method to be used with data fusion tests in c108026 . That was yet
another bug. Now a small note: I didn't add these tests initially to not to be
understood as rude against @jhorstmann and the pr opened there, but as you can
see I have committed the exact tests in d6e4744 with the co-authoring feature
to make it cumulative effort.

2. Extensibility: Now iterators can be extended with different iterator
types. Now, if you want, bit view can dispense not exact size chunk iterator or
bit by bit iterator. Whatever you like. Just adding the wrapping iterator for
cases makes it easier.

> If the PR's aim is to add support for endianness, I would expect some
demonstration that the new code can do something that the old code can't (aka
tests)

3. Architecture support: Now it can compile and run on big-endian
architectures. Still, we have work to do there but we will get there
eventually. For big-endian tests are written in c7428fb . Moreover, I think we
should write more generic implementations, like how we are doing over the last
4 months, and still support platforms that we have promised. Personally, I
don't want to write too much architecture-specific code in Rust to make it work
over the upcoming months, and I can also advocate for that for the members of
the Arrow Rust team. In the C++ version, I saw these and you can infer from how
it is hard to support multiple platforms:
https://github.com/apache/arrow/pull/7507/files#diff-c3b0484ad8586ff46fa035d446a7d1c3a30cd35d13cd05678c99814938e07d5bR78-R214

4. Preventing bugs: As you can see the C++ implementation's sophisticated
code, it is really easy to make mistakes in this field, while doing bit
shaking, bit twiddling etc. You might carry one bit right but you don't
consider the carry and it works for a long time until we realize that it is not
working anymore. So abstracting some things from the development is always good
from my point of view. And I find this pragmatically correct for this case.

Here you can see mips (be) test results:
```
Compiling arrow v3.0.0-SNAPSHOT (/project)
Finished test [unoptimized + debuginfo] target(s) in 5.64s
Running /target/mips-unknown-linux-gnu/debug/deps/arrow-ba04cf069343d58e

running 6 tests
test
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_aligned
... ok
test
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_reinterpret
... ok
test
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned
... ok
test
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_1_byte
... ok
test
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_bits_across_bytes
... ok
test
util::bit_slice_iterator::tests_bit_slices_big_endian::test_bit_slice_iter_unaligned_remainder_bits_large
... ok
```

Here you can see armv7 (le) test results:
```
Finished test [unoptimized + debuginfo] target(s) in 0.11s
Running
/target/armv7-unknown-linux-gnueabihf/debug/deps/arrow-6ffb743de7744875

running 6 tests
test
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_aligned
... ok
test
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_reinterpret
... ok
test
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned
... ok
test
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_1_byte
... ok
test
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_bits_across_bytes
... ok
test
util::bit_slice_iterator::tests_bit_slices_little_endian::test_bit_slice_iter_unaligned_remainder_bits_large
... ok
```

> If the PR's aim is to make the code easier to understand, I would expect
some description of why the new code is easier to understand than the old
(which will be a subjective judgement, for sure).

4. Flexibility: You can see byte reinterpretation as I've mentioned/promised
before in the tests contained in commit c7428fb . Moreover, you can see that
the new implementation without comments is only 100 lines exact. Also, views,
buffers, iterators, bit sequence interpretation is completely separate.
Obviously, as you said, that is subjective to the reader. I find the separation
better atm.

> Since this PR seems to have elements of all three goals, but is light on
the explanation, I am struggling to evaluate it concisely

I hope I have answered all your questions.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] vertexclique commented on pull request #8598: ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers

Reply via email to