alamb commented on code in PR #9020:
URL: https://github.com/apache/arrow-rs/pull/9020#discussion_r2644227651
##########
arrow-buffer/src/buffer/boolean.rs:
##########
@@ -26,17 +26,56 @@ use std::ops::{BitAnd, BitOr, BitXor, Not};
/// A slice-able [`Buffer`] containing bit-packed booleans
///
-/// `BooleanBuffer`s can be modified using [`BooleanBufferBuilder`]
+/// This structure represents a sequence of boolean values packed into a
+/// byte-aligned [`Buffer`]. Both the offset and length are represented in
bits.
+///
+/// # Layout
+///
+/// The values are represented as little endian bit-packed values, where the
+/// least significant bit of each byte represents the first boolean value and
+/// then proceeding to the most significant bit.
+///
+/// For example, the 10 bit bitmask `0b0111001101` has length 10, and is
+/// represented using 2 bytes with offset 0 like this:
+///
+/// ```text
+/// ┌─────────────────────────────────┐ ┌─────────────────────────────────┐
+/// │┌───┬───┬───┬───┬───┬───┬───┬───┐│ │┌───┬───┬───┬───┬───┬───┬───┬───┐│
+/// ││ 1 │ 1 │ 0 │ 0 │ 1 │ 1 │ 0 │ 1 ││ ││ ? │ ? │ ? │ ? │ ? │ ? │ 0 │ 1 ││
+/// │└───┴───┴───┴───┴───┴───┴───┴───┘│ │└───┴───┴───┴───┴───┴───┴───┴───┘│
+/// └─────────────────────────────────┘ └─────────────────────────────────┘
+/// 7 Byte 0 0 7 Byte 1 0
bit
+///
offset
+/// length = 10 bits, offset = 0
+/// ```
+///
+/// The same bitmask with length 10 and offset 3 would be represented like
this:
+/// ```
+/// ┌─────────────────────────────────┐ ┌─────────────────────────────────┐
+/// │┌───┬───┬───┬───┬───┬───┬───┬───┐│ │┌───┬───┬───┬───┬───┬───┬───┬───┐│
+/// ││ 0 │ 1 │ 1 │ 0 │ 1 │ ? │ ? │ ? ││ ││ ? │ ? │ ? │ 0 │ 1 │ 1 │ 1 │ 0 ││
+/// │└───┴───┴───┴───┴───┴───┴───┴───┘│ │└───┴───┴───┴───┴───┴───┴───┴───┘│
+/// └─────────────────────────────────┘ └─────────────────────────────────┘
+/// 7 Byte 0 0 7 Byte 1 0
bit
+///
offset
+/// length = 10 bits, offset = 3
+/// ```
+/// Note that the bits marked `?` are not part of the (logical) mask and
+/// may contain either `0` or `1`
///
///
/// # See Also
+/// * [`BooleanBufferBuilder`] for building [`BooleanBuffer`] instances
/// * [`NullBuffer`] for representing null values in Arrow arrays
///
/// [`NullBuffer`]: crate::NullBuffer
#[derive(Debug, Clone, Eq)]
pub struct BooleanBuffer {
+ /// Underlying buffer (byte aligned)
buffer: Buffer,
+ /// Offset in bits (not bytes)
offset: usize,
+ /// Length in bits (not bytes)
len: usize,
Review Comment:
Yes I think this is a good idea -- I will do so as a follow on PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]