kevingurney opened a new pull request, #36190:
URL: https://github.com/apache/arrow/pull/36190

   ### Rationale for this change
   
   Now that the MATLAB interface supports some basic `arrow.array.Array` types, 
it would be helpful to start building out the tabular types (e.g. `RecordBatch` 
and `Table`) in parallel.
   
   This pull request contains a basic implementation of 
`arrow.tabular.RecordBatch` (name subject to change).
   
   ### What changes are included in this PR?
   
   1. Added new `arrow.tabular.RecordBatch` class that can be constructed from 
a MATLAB `table`.
   2. Added new test class `tRecordBatch`.
   
   ### Are these changes tested?
   
   Yes.
   
   1. Added new test class `tRecordBatch` containing basic tests for the 
`arrow.tabular.RecordBatch` class.
   
   ### Are there any user-facing changes?
   
   Yes.
   
   1. Added new class `arrow.tabular.RecordBatch`.
   
   **Example**:
   
   ```matlab
   >> matlabTable = table(uint64([1,2,3]'), [true false true]', [0.1, 0.2, 
0.3]', VariableNames=["UInt64", "Boolean", "Float64"])
   
   matlabTable =
   
     3x3 table
   
       UInt64    Boolean    Float64
       ______    _______    _______
   
         1        true        0.1  
         2        false       0.2  
         3        true        0.3  
   
   >> arrowRecordBatch = arrow.tabular.RecordBatch(matlabTable)
   
   arrowRecordBatch = 
   
   UInt64:   [
       1,
       2,
       3
     ]
   Boolean:   [
       true,
       false,
       true
     ]
   Float64:   [
       0.1,
       0.2,
       0.3
     ]
   
   >> convertedMatlabTable = table(arrowRecordBatch)    
   
   convertedMatlabTable =
   
     3x3 table
   
       UInt64    Boolean    Float64
       ______    _______    _______
   
         1        true        0.1  
         2        false       0.2  
         3        true        0.3  
   
   >> isequal(matlabTable, convertedMatlabTable)
   
   ans =
   
     logical
   
      1
   ```
   
   2. Added properties `NumColumns` and `ColumnNames` to 
`arrow.tabular.RecordBatch`:
   
   **Example**:
   
   ```matlab
   >> arrowRecordBatch.NumColumns 
   
   ans =
   
     int32
   
      3
   
   >> arrowRecordBatch.ColumnNames
   
   ans = 
   
     1x3 string array
   
       "UInt64"    "Boolean"    "Float64"
   ```
   
   3. Added `column(i)` method to `arrow.tabular.RecordBatch` to retrieve the 
`i`th column of a `RecordBatch` as an `arrow.array.Array`.
   
   **Example**:
   
   ```matlab
   >> arrowUInt64Array = arrowRecordBatch.column(1) 
   
   arrowUInt64Array = 
   
   [
     1,
     2,
     3
   ]
   >> class(arrowUInt64Array)
   
   ans =
   
       'arrow.array.UInt64Array'
   
   >> arrowBooleanArray = arrowRecordBatch.column(2)
   
   arrowBooleanArray = 
   
   [
     true,
     false,
     true
   ]
   
   >> class(arrowBooleanArray)
   
   ans =
   
       'arrow.array.UInt64Array'
   
   >> arrowFloat64Array = arrowRecordBatch.column(3)
   
   arrowFloat64Array = 
   
   [
     0.1,
     0.2,
     0.3
   ]
   
   >> class(arrowFloat64Array)
   
   ans =
   
       'arrow.array.Float64Array'
   ```
   
   ### Future Directions
   
   1. Implement C++ logic for `toMATLAB` when the Arrow memory for a 
`RecordBatch` did originate from a MATLAB array (e.g. read from a Parquet file 
or somewhere else).
   2. Add more supported construction interfaces (e.g. 
`arrow.tabular.RecordBatch(array1, ..., arrayN)`, 
arrow.tabular.RecordBatch.fromArrays(arrays)`, etc.).
   3. Create an `arrow.tabular.Schema` class. Expose this as a public property 
on the `RecordBatch` class. Create related `arrow.type.Field` and 
`arrow.type.Type` classes.
   4. Create an `arrow.tabular.Table` and related `arrow.array.ChunkedArray` 
class.
   5. Add more `arrow.array.Array` types (e.g. `StringArray`, `TimestampArray`, 
`Time64Array`).
   6. Create a basic workflow example of serializing a `RecordBatch` to disk 
using an I/O function (e.g. Parquet writing).
   
   ### Notes
   
   1. Thanks @sgilmore10 for your help with this pull request!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to