nevi-me commented on a change in pull request #8546:
URL: https://github.com/apache/arrow/pull/8546#discussion_r513983198
##########
File path: rust/arrow/src/array/data.rs
##########
@@ -209,6 +209,61 @@ impl ArrayData {
}
}
+impl PartialEq for ArrayData {
+ fn eq(&self, other: &Self) -> bool {
+ assert_eq!(
+ self.data_type(),
+ other.data_type(),
+ "Data types not the same"
+ );
+ assert_eq!(self.len(), other.len(), "Lengths not the same");
+ // TODO: when adding tests for this, test that we can compare with
arrays that have offsets
+ assert_eq!(self.offset(), other.offset(), "Offsets not the same");
+ assert_eq!(self.null_count(), other.null_count());
+ // compare buffers excluding padding
+ let self_buffers = self.buffers();
+ let other_buffers = other.buffers();
+ assert_eq!(self_buffers.len(), other_buffers.len());
+ self_buffers.iter().zip(other_buffers).for_each(|(s, o)| {
+ compare_buffer_regions(
+ s,
+ self.offset(), // TODO mul by data length
+ o,
+ other.offset(), // TODO mul by data len
+ );
+ });
+ // assert_eq!(self.buffers(), other.buffers());
+
+ assert_eq!(self.child_data(), other.child_data());
+ // null arrays can skip the null bitmap, thus only compare if there
are no nulls
+ if self.null_count() != 0 || other.null_count() != 0 {
+ compare_buffer_regions(
+ self.null_buffer().unwrap(),
+ self.offset(),
+ other.null_buffer().unwrap(),
+ other.offset(),
+ )
+ }
Review comment:
Hi @jorgecarleitao I introduced these changes in #8200 because we were
failing on integration tests where:
* We create an Arrow array which has no nulls from JSON, and don't allocate
a null buffer (as it's optional
http://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps)
* When reading IPC files & streams, the C++ always has the null buffer
allocated
So without changing that to account for the above, both the Parquet
roundtrips and integration tests were failing even though we were trying to
compare logically correct data.
On panics, there's a slight leakage of intention, in the sense that much of
the equality was introduced to aid in testing, but because we don't yet have a
logging mechanism, something like the below:
```rust
let a: Int32Array = ...;
let b: Int32Array = ..;;
// try compare in tests
assert_eq!(&a, &b);
```
Would fail with a panic on the assert if the arrays are unequal, but because
the comparison would return a `bool`, one wouldn't know where the failure
occurred.
So this becomes tough to see why tests are failing, and hence the practise
(I'm fine with calling it bad as I still do it) of adding asserts.
I'll work with you on #8541 to find a solution that can benefit both us when
testing, and downstream users if they want a panic-free
`Array:Ref:equals(other: &ArrayRef) -> bool;` kind of interface.
> If it is trying to achieve logical equality, then it has a lot of issues
as buffer and child equality are not that simple.
I appreciate this, and the intention is to achieve logical equality even
with slices. I believe that the path that I took, while still incomplete, would
help us to get there.
The lack of offset handling from that `TODO` could have been motivation not
to merge the parquet branch in, but with all the refactoring work happening
elsewhere, it's really become very inconvenient to keep rebasing and handling
merge conflicts every few other days.
I checked that tests were passing on both branches before the merge, and in
master after the merge.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]