nevi-me commented on a change in pull request #8546:
URL: https://github.com/apache/arrow/pull/8546#discussion_r513983198



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -209,6 +209,61 @@ impl ArrayData {
     }
 }
 
+impl PartialEq for ArrayData {
+    fn eq(&self, other: &Self) -> bool {
+        assert_eq!(
+            self.data_type(),
+            other.data_type(),
+            "Data types not the same"
+        );
+        assert_eq!(self.len(), other.len(), "Lengths not the same");
+        // TODO: when adding tests for this, test that we can compare with 
arrays that have offsets
+        assert_eq!(self.offset(), other.offset(), "Offsets not the same");
+        assert_eq!(self.null_count(), other.null_count());
+        // compare buffers excluding padding
+        let self_buffers = self.buffers();
+        let other_buffers = other.buffers();
+        assert_eq!(self_buffers.len(), other_buffers.len());
+        self_buffers.iter().zip(other_buffers).for_each(|(s, o)| {
+            compare_buffer_regions(
+                s,
+                self.offset(), // TODO mul by data length
+                o,
+                other.offset(), // TODO mul by data len
+            );
+        });
+        // assert_eq!(self.buffers(), other.buffers());
+
+        assert_eq!(self.child_data(), other.child_data());
+        // null arrays can skip the null bitmap, thus only compare if there 
are no nulls
+        if self.null_count() != 0 || other.null_count() != 0 {
+            compare_buffer_regions(
+                self.null_buffer().unwrap(),
+                self.offset(),
+                other.null_buffer().unwrap(),
+                other.offset(),
+            )
+        }

Review comment:
       Hi @jorgecarleitao I introduced these changes in #8200 because we were 
failing on integration tests where:
   
   * We create an Arrow array which has no nulls from JSON, and don't allocate 
a null buffer (as it's optional 
http://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps)
   * When reading IPC files & streams, the C++ always has the null buffer 
allocated
   
   So without changing that to account for the above, both the Parquet 
roundtrips and integration tests were failing even though we were trying to 
compare logically correct data.
   
   On panics, there's a slight leakage of intention, in the sense that much of 
the equality was introduced to aid in testing, but because we don't yet have a 
logging mechanism, something like the below:
   
   ```rust
   let a: Int32Array = ...;
   let b: Int32Array = ..;;
   // try compare in tests
   assert_eq!(&a, &b);
   ```
   
   Would fail with a panic on the assert if the arrays are unequal, but because 
the comparison would return a `bool`, one wouldn't know where the failure 
occurred.
   So this becomes tough to see why tests are failing, and hence the practise 
(I'm fine with calling it bad as I still do it) of adding asserts.
   I'll work with you on #8541 to find a solution that can benefit both us when 
testing, and downstream users if they want a panic-free 
`Array:Ref:equals(other: &ArrayRef) -> bool;` kind of interface.
   
   > If it is trying to achieve logical equality, then it has a lot of issues 
as buffer and child equality are not that simple.
   
   I appreciate this, and the intention is to achieve logical equality even 
with slices. I believe that the path that I took, while still incomplete, would 
help us to get there.
   
   The lack of offset handling from that `TODO` could have been motivation not 
to merge the parquet branch in, but with all the refactoring work happening 
elsewhere, it's really become very inconvenient to keep rebasing and handling 
merge conflicts every few other days.
   
   I checked that tests were passing on both branches before the merge, and in 
master after the merge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to