kevingurney opened a new issue, #35676:
URL: https://github.com/apache/arrow/issues/35676
### Describe the enhancement requested
This is a follow up to the initial null value handling support that was
added in #35598.
In order to give clients more flexibility in how null values in MATLAB
arrays are detected when constructing an `arrow.array.Array`, it would be
helpful to expose a few name-value pairs on the `arrow.array.Array` class (and
concrete subclasses).
**Two possible name-value pairs for handling null value detection when
constructing an `arrow.array.Array` are described below.**
## `DetectNulls`
**Supported values**: `true | false`
`true` - "automatically" detect null values in the input MATLAB array based
on the default value (if any) of `NullDetectionFcn`. For example, for
`arrow.array.Float64Array`, `DetectNulls` would default to `true` and
`NullDetectionFcn` would default to `@isnan`. This would mean that any `NaN`
values in the input MATLAB `double` array will be treated as null values when
constructing an `arrow.array.Float64Array`.
`false` - Do not "automatically" detect null values. For some types (e.g.
`arrow.aray.ListArray`), if `DetectNulls = false` and there are nonconvertible
values (e.g. `<missing>`) in the input MATLAB array, then an error would be
thrown. We are still thinking through the design for how users can workaround
this case.
**Example:**
```matlab
>> matlabArray = string(["A", missing, "C", missing])'
matlabArray =
4x1 string array
"A"
<missing>
"C"
<missing>
% Defaults to treating <missing> as null
>> arrowArray = arrow.array.StringArray(matlabArray, DetectNulls=true)
[
"A",
null,
"C",
null
]
```
**Note**: it most likely makes sense for different `arrow.array.Array`
subclasses to have different default values for `DetectNulls` and
`NullDetectionFcn`. For example, it doesn't make sense to set
`DetectNulls=true` by default for `arrow.array.Int8Array` since there is no
concept of "null-ability" or "missing-ness" for MATLAB integer types. On the
other hand, `ismissing(double([1, NaN, 3]))` in MATLAB returns `logical([0, 1,
0])` because `NaN` is treated as a "missing" value. See
https://www.mathworks.com/help/matlab/data_analysis/missing-data-in-matlab.html
for more information.
## `NullDetectionFcn`
**Supported values**: `function_handle` that takes one input (a vector) and
returns a `logical` vector
A `function_handle` used for "detecting" values that should be treated as
null when constructing an `arrow.array.Array`. For example, when set to
`@isnan`, all `NaN` values in an input MATLAB `double` array would be treated
as null when constructing an `arrow.array.Float64Array`.
**Example:**
```matlab
>> matlabArray = string(["A", "B", "INVALID", "D", "ERROR", "F"])'
matlabArray =
6x1 string array
"A"
"B"
"INVALID"
"D"
"ERROR"
"F"
>> nullDetectionFcn = @ (s) strcmp(s, "INVALID") || strcmp(s, "ERROR")
nullDetectionFcn =
function_handle with value:
@(s)strcmp(s,"INVALID")||strcmp(s,"ERROR")
% Detects any strings with the values "INVALID" or "ERROR" as null values
>> arrowArray = arrow.array.StringArray(matlabArray,
NullDetectionFcn=nullDetectionFcn)
[
"A",
"B",
null,
"D",
null,
"F"
]
```
---
### Component(s)
MATLAB
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]