kevingurney opened a new pull request, #37013:
URL: https://github.com/apache/arrow/pull/37013
### Rationale for this change
To continue building out the tabular APIs for the MATLAB interface, this PR
adds a new `arrow.tabular.Schema` class which wraps one or more
`arrow.type.Field` objects and semantically describes the names and types of
the columns of a tabular Arrow data structure.
To construct an `arrow.tabular.Schema` object, client code can call an
associated `arrow.schema` construction function (similar to the `arrow.field`
construction function).
This mirrors the tabular APIs in other Arrow bindings, like `pyarrow`.
### What changes are included in this PR?
1. New `arrow.tabular.Schema` class.
2. New `arrow.schema(fields)` construction function for creating instances
of `arrow.tabular.Schema`.
**Example**:
```matlab
>> fieldA = arrow.field("A", arrow.uint8);
>> fieldB = arrow.field("B", arrow.string);
>> fieldC = arrow.field("C", arrow.timestamp);
>> fields = [fieldA, fieldB, fieldC];
>> schema = arrow.schema(fields)
schema =
A: uint8
B: string
C: timestamp[us]
>> schema.NumFields
ans =
int32
3
>> schema.FieldNames
ans =
1×3 string array
"A" "B" "C"
>> f = schema.field(3)
f =
C: timestamp[us]
>> f = schema.field("B")
f =
B: string
```
### Are these changes tested?
Yes.
1. Added a new test class `tSchema.m` which contains tests for
`arrow.schema` and `arrow.tabular.Schema`.
### Are there any user-facing changes?
Yes.
1. New public `arrow.tabular.Schema` class.
1.1 **Properties**
1.1.1 `NumFields`
1.1.2 `FieldNames`
1.1.3 `Fields`
1.2 **Methods**
1.2.1 `field(index)` where index is a valid numeric index or field
name.
2. New public `arrow.schema(fields)` construction function.
### Future Directions
1. @sgilmore10 introduced some new input validation functions that are
generic and reusable in #36978. To avoid using multiple different approaches to
input validation across the MATLAB code base, it would be a good idea to
re-implement the input validation for `Schema` methods (e.g. `field`) to use
these validation functions consistently.
4. Error handling in some edge cases is less than ideal right now for
`Schema`. We should consider doing a more thorough review of error handling and
error messages across the MATLAB code base now that we have more APIs and have
seen several similar error states appear in different parts of the code base
(e.g. indexing errors).
5. We may want to consider alternative construction syntaxes beyond just
`arrow.schema(fields)`. For example, `arrow.schema(fieldName_1, fieldType_1,
..., fieldName_i, fieldType_i, ... fieldName_n, fieldType_n)` might be another
convenient syntax that we could consider supporting.
6. We should add a `Schema` property to `RecordBatch`.
7. Consider adding a `toMATLAB` method for `Schema` which returns an empty
MATLAB `table` with corresponding variable names and MATLAB types.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]