kevingurney opened a new pull request, #37013:
URL: https://github.com/apache/arrow/pull/37013

   ### Rationale for this change
   
   To continue building out the tabular APIs for the MATLAB interface, this PR 
adds a new `arrow.tabular.Schema` class which wraps one or more 
`arrow.type.Field` objects and semantically describes the names and types of 
the columns of a tabular Arrow data structure.
   
   To construct an `arrow.tabular.Schema` object, client code can call an 
associated `arrow.schema` construction function (similar to the `arrow.field` 
construction function).
   
   This mirrors the tabular APIs in other Arrow bindings, like `pyarrow`.
   
   ### What changes are included in this PR?
   
   1. New `arrow.tabular.Schema` class.
   2. New `arrow.schema(fields)` construction function for creating instances 
of `arrow.tabular.Schema`.
   
   **Example**:
   
   ```matlab
   >> fieldA = arrow.field("A", arrow.uint8);
   >> fieldB = arrow.field("B", arrow.string);
   >> fieldC = arrow.field("C", arrow.timestamp);
   
   >> fields = [fieldA, fieldB, fieldC];
   
   >> schema = arrow.schema(fields)
   
   schema = 
   
   A: uint8
   B: string
   C: timestamp[us]
   
   >> schema.NumFields
   
   ans =
   
     int32
   
      3
   
   >> schema.FieldNames
   
   ans = 
   
     1×3 string array
   
       "A"    "B"    "C"
   
   >> f = schema.field(3)
   
   f = 
   
   C: timestamp[us]
   
   >> f = schema.field("B")
   
   f = 
   
   B: string
   ```
   
   ### Are these changes tested?
   
   Yes.
   
   1. Added a new test class `tSchema.m` which contains tests for 
`arrow.schema` and `arrow.tabular.Schema`. 
   
   ### Are there any user-facing changes?
   
   Yes.
   
   1. New public `arrow.tabular.Schema` class.
       1.1 **Properties**
                   1.1.1 `NumFields`
           1.1.2 `FieldNames`
           1.1.3 `Fields`
       1.2 **Methods**
           1.2.1 `field(index)` where index is a valid numeric index or field 
name.
   2. New public `arrow.schema(fields)` construction function.
   
   ### Future Directions
   
   1. @sgilmore10 introduced some new input validation functions that are 
generic and reusable in #36978. To avoid using multiple different approaches to 
input validation across the MATLAB code base, it would be a good idea to 
re-implement the input validation for `Schema` methods (e.g. `field`) to use 
these validation functions consistently.
   4. Error handling in some edge cases is less than ideal right now for 
`Schema`. We should consider doing a more thorough review of error handling and 
error messages across the MATLAB code base now that we have more APIs and have 
seen several similar error states appear in different parts of the code base 
(e.g. indexing errors).
   5. We may want to consider alternative construction syntaxes beyond just 
`arrow.schema(fields)`. For example, `arrow.schema(fieldName_1, fieldType_1, 
..., fieldName_i, fieldType_i, ... fieldName_n, fieldType_n)` might be another 
convenient syntax that we could consider supporting.
   6. We should add a `Schema` property to `RecordBatch`.
   7. Consider adding a `toMATLAB` method for `Schema` which returns an empty 
MATLAB `table` with corresponding variable names and MATLAB types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to