kevingurney opened a new pull request, #38531:
URL: https://github.com/apache/arrow/pull/38531

   ### Rationale for this change
   
   This pull request adds a new `ValidationMode` name-value pair to the 
`arrow.array.ListArray.fromArrays` function. This allows client code to 
validate whether provided `offsets` and `values` are valid.
   
   ### What changes are included in this PR?
   
   1. Added a new name-value pair `ValidationMode = "None" | "Minimal" 
(default) | "Full".` to the `arrow.array.ListArrays.fromArrays` function. If 
`ValidationMode` is set to `"Minimal"` or `"Full"` and the provided `offsets` 
and `values` arrays are invalid, then an error will be thrown when calling the 
`arrow.array.ListArrays.fromArrays` function.
   2. Set the default `ValidationMode` for `arrow.array.ListArray.fromArrays` 
to `"Minimal"` to balance usability and performance when creating `ListArray`s. 
Hopefully, this should help more MATLAB users navigate the complexities of 
creating `ListArray`s "from scratch' using `offsets` and `values` arrays.
   3. Added a new `arrow.array.ValidationMode` enumeration class. This is used 
as the type of the `ValidationMode` name-value pair on the 
`arrow.array.ListArray.fromArrays` function.
   
   Supported values include:
   * `arrow.array.ValidationMode.None` - Do no validation checks on the given 
`Array`.
   * `arrow.array.ValidationMode.Minimal` - Do relatively inexpensive 
validation checks on the given `Array`. Delegates to the C++ `Array::Validate` 
method under the hood.
   * `arrow.array.ValidationMode.Full` - Do expensive / robust validation 
checks on the given `Array`. Delegates to the C++ `Array::ValidateFull` method 
under the hood.
   
   ### Are these changes tested?
   
   Yes.
   
   1. Added new test cases for verifying `ValidationMode` behavior to 
`tListArray.m`.
   
   ### Are there any user-facing changes?
   
   Yes.
   
   1. Client code can now control validation behavior when calling 
`arrow.array.ListArray.fromArrays`.
   2. By default, an error will now be thrown by 
`arrow.array.ListArray.fromArrays` for certain invalid combinations of 
`offsets` and `values`. In other words, `arrow.array.ListArray.fromArrays` will 
call the C++ method `Array::Validate` by default, which corresponds to 
`arrow.array.ValidationMode.Minimal`.
   3. Client code can now create `arrow.array.ValidationMode` enumeration 
values.
   
   ### Future Directions
   
   1. Currently `ValidationMode` has only been added to the 
`arrow.array.ListArray.fromArrays` method. However, in the future, it may make 
sense to generalize validation behavior and provide `ValidationMode` on other 
`fromMATLAB` and `fromArrays` methods for other `Array` types. We decided to 
start with `ListArray` as an incremental first step since we suspect creating 
valid `ListArray`s from `offsets` and `values` will generally be more error 
prone than creating simpler `Array` types like `Float64Array` or `StringArray`.
   
   ### Notes
   
   1. We chose to set the default `ValidationMode` value to 
`arrow.array.ValidationMode.Minimal` to balance usability and performance. 
Depending on whether this ends up causing major performance issues for lots of 
users, we could consider changing this to `arrow.array.ValidationMode.None` in 
the future. 
   2. Thank you @sgilmore10 for your help with this pull request!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to