tustvold commented on code in PR #4838:
URL: https://github.com/apache/arrow-rs/pull/4838#discussion_r1331544518
##########
arrow-array/src/array/mod.rs:
##########
@@ -173,52 +173,60 @@ pub trait Array: std::fmt::Debug + Send + Sync {
/// ```
fn offset(&self) -> usize;
- /// Returns the null buffer of this array if any
+ /// Returns the null buffer of this array if any.
///
- /// Note: some arrays can encode their nullability in their children, for
example,
+ /// The null buffer encodes the "physical" nulls of an array and is
pre-computed and very efficient.
+ /// However, some arrays can also encode nullability in their children,
for example,
/// [`DictionaryArray::values`] values or [`RunArray::values`], or without
a null buffer,
- /// such as [`NullArray`]. Use [`Array::logical_nulls`] to obtain a
computed mask encoding this
+ /// such as [`NullArray`]. To determine if each element of such an array
is logically null,
+ /// you can use the slower [`Array::logical_nulls`] to obtain a computed
mask .
fn nulls(&self) -> Option<&NullBuffer>;
- /// Returns the logical null buffer of this array if any
+ /// Returns a potentially computed [`NullBuffer`] that represent the
logical null values of this array, if any.
///
/// In most cases this will be the same as [`Array::nulls`], except for:
///
- /// * DictionaryArray where [`DictionaryArray::values`] contains nulls
- /// * RunArray where [`RunArray::values`] contains nulls
- /// * NullArray where all indices are nulls
+ /// * ['DictionaryArray`] where [`DictionaryArray::values`] contains nulls
+ /// * [`RunArray`] where [`RunArray::values`] contains nulls
+ /// * [`NullArray`] where all indices are nulls
///
/// In these cases a logical [`NullBuffer`] will be computed, encoding the
logical nullability
/// of these arrays, beyond what is encoded in [`Array::nulls`]
fn logical_nulls(&self) -> Option<NullBuffer> {
self.nulls().cloned()
}
- /// Returns whether the element at `index` is null.
- /// When using this function on a slice, the index is relative to the
slice.
+ /// Returns whether the element at `index` is "physically" null, as
explained
+ /// on [`Array::nulls`].
///
- /// Note: this method returns the physical nullability, i.e. that encoded
in [`Array::nulls`]
- /// see [`Array::logical_nulls`] for logical nullability
+ /// Note: For performance reasons, logical nullability can and does differ
from the physical
+ /// nullability for some types. This difference can lead to surprising
results,
+ /// such as [`NullArray::is_null`] always
+ /// returns `false`. Other arrays which may have surprising results are
[`DictionaryArray]` and
+ /// [`RunArray`]. See [`Self::logical_nulls`] for logical nullability.
Review Comment:
```suggestion
/// Note: For performance reasons, this method returns nullability
solely as determined by the
/// null buffer. This difference can lead to surprising results, for
example, [`NullArray::is_null`] always
/// returns `false` as the array lacks a null buffer. Similarly
[`DictionaryArray]` and [`RunArray`] may
/// encode nullability in their children. See [`Self::logical_nulls`]
for more information.
```
I tried to make this more clear on the why
##########
arrow-array/src/array/mod.rs:
##########
@@ -173,52 +173,60 @@ pub trait Array: std::fmt::Debug + Send + Sync {
/// ```
fn offset(&self) -> usize;
- /// Returns the null buffer of this array if any
+ /// Returns the null buffer of this array if any.
///
- /// Note: some arrays can encode their nullability in their children, for
example,
+ /// The null buffer encodes the "physical" nulls of an array and is
pre-computed and very efficient.
+ /// However, some arrays can also encode nullability in their children,
for example,
/// [`DictionaryArray::values`] values or [`RunArray::values`], or without
a null buffer,
- /// such as [`NullArray`]. Use [`Array::logical_nulls`] to obtain a
computed mask encoding this
+ /// such as [`NullArray`]. To determine if each element of such an array
is logically null,
+ /// you can use the slower [`Array::logical_nulls`] to obtain a computed
mask .
fn nulls(&self) -> Option<&NullBuffer>;
- /// Returns the logical null buffer of this array if any
+ /// Returns a potentially computed [`NullBuffer`] that represent the
logical null values of this array, if any.
///
/// In most cases this will be the same as [`Array::nulls`], except for:
///
- /// * DictionaryArray where [`DictionaryArray::values`] contains nulls
- /// * RunArray where [`RunArray::values`] contains nulls
- /// * NullArray where all indices are nulls
+ /// * ['DictionaryArray`] where [`DictionaryArray::values`] contains nulls
+ /// * [`RunArray`] where [`RunArray::values`] contains nulls
+ /// * [`NullArray`] where all indices are nulls
///
/// In these cases a logical [`NullBuffer`] will be computed, encoding the
logical nullability
/// of these arrays, beyond what is encoded in [`Array::nulls`]
fn logical_nulls(&self) -> Option<NullBuffer> {
self.nulls().cloned()
}
- /// Returns whether the element at `index` is null.
- /// When using this function on a slice, the index is relative to the
slice.
+ /// Returns whether the element at `index` is "physically" null, as
explained
+ /// on [`Array::nulls`].
///
- /// Note: this method returns the physical nullability, i.e. that encoded
in [`Array::nulls`]
- /// see [`Array::logical_nulls`] for logical nullability
+ /// Note: For performance reasons, logical nullability can and does differ
from the physical
+ /// nullability for some types. This difference can lead to surprising
results,
+ /// such as [`NullArray::is_null`] always
+ /// returns `false`. Other arrays which may have surprising results are
[`DictionaryArray]` and
+ /// [`RunArray`]. See [`Self::logical_nulls`] for logical nullability.
+ ///
+ /// Note: When using this function on a slice, the index is relative to
the slice.
Review Comment:
```suggestion
```
This isn't true following https://github.com/apache/arrow-rs/pull/3778 which
altered the offset to only apply to the values of the array and not the null
buffer
##########
arrow-array/src/array/mod.rs:
##########
@@ -173,52 +173,60 @@ pub trait Array: std::fmt::Debug + Send + Sync {
/// ```
fn offset(&self) -> usize;
- /// Returns the null buffer of this array if any
+ /// Returns the null buffer of this array if any.
///
- /// Note: some arrays can encode their nullability in their children, for
example,
+ /// The null buffer encodes the "physical" nulls of an array and is
pre-computed and very efficient.
+ /// However, some arrays can also encode nullability in their children,
for example,
/// [`DictionaryArray::values`] values or [`RunArray::values`], or without
a null buffer,
- /// such as [`NullArray`]. Use [`Array::logical_nulls`] to obtain a
computed mask encoding this
+ /// such as [`NullArray`]. To determine if each element of such an array
is logically null,
+ /// you can use the slower [`Array::logical_nulls`] to obtain a computed
mask .
fn nulls(&self) -> Option<&NullBuffer>;
- /// Returns the logical null buffer of this array if any
+ /// Returns a potentially computed [`NullBuffer`] that represent the
logical null values of this array, if any.
///
/// In most cases this will be the same as [`Array::nulls`], except for:
///
- /// * DictionaryArray where [`DictionaryArray::values`] contains nulls
- /// * RunArray where [`RunArray::values`] contains nulls
- /// * NullArray where all indices are nulls
+ /// * ['DictionaryArray`] where [`DictionaryArray::values`] contains nulls
+ /// * [`RunArray`] where [`RunArray::values`] contains nulls
+ /// * [`NullArray`] where all indices are nulls
///
/// In these cases a logical [`NullBuffer`] will be computed, encoding the
logical nullability
/// of these arrays, beyond what is encoded in [`Array::nulls`]
fn logical_nulls(&self) -> Option<NullBuffer> {
self.nulls().cloned()
}
- /// Returns whether the element at `index` is null.
- /// When using this function on a slice, the index is relative to the
slice.
+ /// Returns whether the element at `index` is "physically" null, as
explained
+ /// on [`Array::nulls`].
Review Comment:
```suggestion
/// Returns whether the element at `index` is null according to
[`Array::nulls`]
```
I think it is better to just inline what physical nullability means, as
opposed to alluding to it
##########
arrow-array/src/array/mod.rs:
##########
@@ -173,52 +173,60 @@ pub trait Array: std::fmt::Debug + Send + Sync {
/// ```
fn offset(&self) -> usize;
- /// Returns the null buffer of this array if any
+ /// Returns the null buffer of this array if any.
///
- /// Note: some arrays can encode their nullability in their children, for
example,
+ /// The null buffer encodes the "physical" nulls of an array and is
pre-computed and very efficient.
+ /// However, some arrays can also encode nullability in their children,
for example,
Review Comment:
```suggestion
/// The null buffer encodes the "physical" nulls of an array.
/// However, some arrays can also encode nullability in their children,
for example,
```
I think the notion of it being pre-computed is a bit confusing, it isn't
really computed at all, you wouldn't describe the values as being pre-computed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]