paleolimbot commented on code in PR #276:
URL: https://github.com/apache/arrow-nanoarrow/pull/276#discussion_r1296220191
##########
src/nanoarrow/buffer_inline.h:
##########
@@ -236,6 +242,41 @@ static inline int8_t ArrowBitGet(const uint8_t* bits,
int64_t i) {
return (bits[i >> 3] >> (i & 0x07)) & 1;
}
+static inline void ArrowBitsGet(const uint8_t* bits, int64_t start_offset,
int64_t length,
+ int8_t* out) {
Review Comment:
Maybe `ArrowBitUnpackInt8()`? (As a follow-up I'll add the `Int32()` version)
##########
src/nanoarrow/buffer_inline.h:
##########
@@ -236,6 +242,41 @@ static inline int8_t ArrowBitGet(const uint8_t* bits,
int64_t i) {
return (bits[i >> 3] >> (i & 0x07)) & 1;
}
+static inline void ArrowBitsGet(const uint8_t* bits, int64_t start_offset,
int64_t length,
+ int8_t* out) {
+ if (length == 0) {
+ return;
+ }
+
+ const uint8_t* bits_cursor = bits;
+ int64_t n_remaining = length;
+ int8_t* out_cursor = out;
Review Comment:
Do you mind using the same terminology/approach as `ArrowBitCountSet()`? (
https://github.com/apache/arrow-nanoarrow/blob/main/src/nanoarrow/buffer_inline.h#L297-L302
). I recently had to fix a segfault resulting from index math on the
first/middle/last byte and while I'm not sure the approach there is better than
what you have here, it will probably help fix the next issue with either
function to have the code look similar for both functions.
##########
src/nanoarrow/buffer_test.cc:
##########
@@ -271,6 +271,65 @@ TEST(BitmapTest, BitmapTestElement) {
EXPECT_EQ(ArrowBitGet(bitmap, 16 + 7), 0);
}
+TEST(BitmapTest, BitmapTestBitsGet) {
+ uint8_t bitmap[10];
+ int8_t result[sizeof(bitmap) * 8];
+
+ memset(bitmap, 0xff, sizeof(bitmap));
+ ArrowBitsGet(bitmap, 0, sizeof(result), result);
+ for (int i = 0; i < sizeof(result); i++) {
+ EXPECT_EQ(result[i], 1);
+ }
+
+ bitmap[2] = 0xfd;
+ ArrowBitsGet(bitmap, 0, sizeof(result), result);
+ EXPECT_EQ(result[16 + 0], 1);
Review Comment:
Would an expectation in the form
`EXPECT_EQ(std::basic_string<uint8_t>(result, n),
std::basic_string<uint8_t>({0, 1, 1, 0, 0, 0}))` be any more readable? It might
scale better to include more cases (e.g., offset != 0, all bits within the same
byte)
##########
src/nanoarrow/buffer_inline.h:
##########
@@ -222,6 +222,12 @@ static inline int64_t _ArrowBytesForBits(int64_t bits) {
return (bits >> 3) + ((bits & 7) != 0);
}
+static inline void _ArrowBitmapUnpackInt8(const uint8_t* bits, int8_t* out) {
+ for (int i = 0; i < 8; i++) {
+ out[i] = (*bits >> i) & 1;
+ }
Review Comment:
I don't know the details well enough to know if it matters, but Arrow C++
writes out all N lines explicitly (
https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/bpacking_default.h#L35-L105
)
##########
src/nanoarrow/buffer_test.cc:
##########
@@ -271,6 +271,65 @@ TEST(BitmapTest, BitmapTestElement) {
EXPECT_EQ(ArrowBitGet(bitmap, 16 + 7), 0);
}
+TEST(BitmapTest, BitmapTestBitsGet) {
+ uint8_t bitmap[10];
+ int8_t result[sizeof(bitmap) * 8];
Review Comment:
You can probably test all the cases with just 3 bytes (which might make the
expectations a little less verbose)? I recently fixed a segfault in the first
byte/middle byte/last byte logic in `ArrowBitCountSet()` and it may be worth
testing that here, too (i.e., offset == 0, offset > 0 that does not align on a
byte boundary, offset + length ends on a byte boundary, offset + length does
*not* end on a byte boundary, offset + length refer to bits in the same byte).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]