alamb commented on a change in pull request #810:
URL: https://github.com/apache/arrow-rs/pull/810#discussion_r739647091
##########
File path: arrow/src/array/data.rs
##########
@@ -559,6 +570,458 @@ impl ArrayData {
)
}
}
+
+ /// "cheap" validation of an `ArrayData`. Ensures buffers are
+ /// sufficiently sized to store `len` + `offset` total elements of
+ /// `data_type` and performs other inexpensive consistency checks.
+ ///
+ /// This check is "cheap" in the sense that it does not validate the
+ /// contents of the buffers (e.g. that all offsets for UTF8 arrays
+ /// are within the bounds of the values buffer).
+ ///
+ /// TODO: add a validate_full that validates the offsets
+ pub fn validate(&self) -> Result<()> {
+ // Need at least this mich space in each buffer
+ let len_plus_offset = self.len + self.offset;
+
+ // Check that the data layout conforms to the spec
+ let layout = layout(&self.data_type);
+
+ // Handling of nulls in `UnionArray` does not seem to conform
+ // to the arrow spec, so skip this check
+ //
+ // Tracking tickets:
+ //
+ // https://github.com/apache/arrow-rs/issues/814
+ // https://github.com/apache/arrow-rs/issues/85
+ if matches!(&self.data_type, DataType::Union(..)) {
+ return Ok(());
Review comment:
I finally settled on leaving `UnionArray` unvalidated for this PR so
that I can backport it to the 6.x release line (it is backwards compatible) and
in a PR that is backward incompatible I will fixup the UnionArray
implementation (and validation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]