bjchambers commented on issue #514: URL: https://github.com/apache/arrow-rs/issues/514#issuecomment-873427233
I think the problem here may be more widespread than I originall thought. It looks like the equality methods are inconsistent in how they handle offset. For instance: ``` pub fn equal(lhs: &ArrayData, rhs: &ArrayData) -> bool { let lhs_nulls = lhs.null_buffer(); let rhs_nulls = rhs.null_buffer(); utils::base_equal(lhs, rhs) && lhs.null_count() == rhs.null_count() && utils::equal_nulls(lhs, rhs, lhs_nulls, rhs_nulls, 0, 0, lhs.len()) && equal_values(lhs, rhs, lhs_nulls, rhs_nulls, 0, 0, lhs.len()) } ``` This starts has `lhs_start` and `rhs_start` as 0. This would suggest that `equal_nulls` and `equal_values` should add the start to the offset. ``` #[inline] pub(super) fn equal_nulls( lhs: &ArrayData, rhs: &ArrayData, lhs_nulls: Option<&Buffer>, rhs_nulls: Option<&Buffer>, lhs_start: usize, rhs_start: usize, len: usize, ) -> bool { let lhs_null_count = count_nulls(lhs_nulls, lhs_start, len); let rhs_null_count = count_nulls(rhs_nulls, rhs_start, len); if lhs_null_count > 0 || rhs_null_count > 0 { let lhs_values = lhs_nulls.unwrap().as_slice(); let rhs_values = rhs_nulls.unwrap().as_slice(); equal_bits( lhs_values, rhs_values, lhs_start + lhs.offset(), rhs_start + rhs.offset(), len, ) } else { true } } ``` This uses the `lhs_start` directly for counting nulls (which doesn't add the offset), but then adds the offset in when calling `equal_bits`. Looking at other parts of equality (such as boolean, primitives, etc.) it seems that it is inconsistent as to whether the offset should be included or not in the start values. I can a stab at fixing it for at least the case identified, but I'm somewhat curious what the intended approach is. It seems like there are two options: 1. The `lhs_start` and `rhs_start` already include the offset, so none of the helper methods should add it again. This seems like it could include the offset just once (up front) and then everything else wouldn't need to, but it may run into more problems if some of the helper methods do include the offset (eg., running off the end of the array, etc.). Also may make it harder to use the `lhs_start` and `rhs_start` with public methods on the array data that respect offset. 2. The `lhs_start` and `rhs_start` are always relative the corresponding offset. When indexing into the data using methods that aren't relative the offset, it needs to be added. It seems like option 2 is most consistent with how things are currently implemented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org