kevingurney opened a new pull request, #34563:
URL: https://github.com/apache/arrow/pull/34563

   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in 
the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your 
changes and offer better suggestions for fixes.  
   -->
   
   This pull request is a follow up to [this mailing list 
discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) 
about integrating 
[`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the 
MATLAB Interface to Arrow code base.
   
   We've spent the last few months working on building `libmexclass` from 
scratch in order to ease development of the MATLAB Interface to Arrow. 
`libmexclass` essentially provides a way to connect MATLAB classes with 
corresponding C++ classes using an approach inspired by the [Proxy Design 
Pattern](https://en.wikipedia.org/wiki/Proxy_pattern).
   
   Our hope is that using `libmexclass` will enable us to more easily build out 
an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding 
Arrow C++ classes and "proxying" method calls on these MATLAB objects to the 
underlying Arrow C++ objects.
   
   ### What changes are included in this PR?
   
   1. Modifications were made to the CMake build system for the MATLAB 
interface to use `libmexclass` under the hood. This includes the addition of a 
new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building 
the new code that uses `libmexclass` under the hood.
   2. To illustrate the basic usage of `libmexclass`, we have added one new 
MATLAB class `arrow.array.Float64Array`. This class allows users to construct 
an Arrow array with logical type `Float64` from a MATLAB `double` array with 
zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of 
the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is 
responsible for delegating method calls on an `arrow.array.Float64Array` to the 
corresponding Arrow C++ `Float64Array`.
   
   ### Are these changes tested?
   
   Yes, these changes have been tested on Linux, macOS, and Windows.
   
   1. We've modified the MATLAB CI GitHub Actions workflow 
(`.github/workflows/matlab.yml`) to build the new  `arrow.array.Float64Array` 
code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` 
to the `cmake` command call in `ci/scripts/matlab_build.sh`.
   2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` 
which tests for successful construction of an `arrow.array.Float64Array`.
   3. We've confirmed that the `Dev` CI workflow linting checks are all passing 
and appropriate Apache license headers have been added.
   4. We've manually tested creation, deletion, and assignment of multiple 
`arrow.array.Float64Array` instances on Linux, macOS, and Windows with a 
variety of different MATLAB `double` arrays.
   
   ### Are there any user-facing changes?
   
   Yes, there is now a public class named `arrow.array.Float64Array` which is 
added to the MATLAB Path.
   
   Included below is a simple example of creating two different 
`arrow.array.Float64Array` objects in MATLAB:
   
   ```matlab
   >> A = arrow.array.Float64Array([1, 2, 3])            
   
   A = 
   
   [
     1,
     2,
     3
   ]
   
   >> random = arrow.array.Float64Array(rand(1, 10, 100))
   
   random = 
   
   [
     0.6311887342690112,
     0.355073651878849,
     0.9970032716066477,
     0.22417149898312716,
     0.6524510729686149,
     0.6049906419082594,
     0.38724543148313495,
     0.14218715929050407,
     0.025134985710203117,
     0.4211122537652413,
     ...
     0.6228027906591304,
     0.7966246853083961,
     0.74587490154065,
     0.12553623135481973,
     0.8223940067590204,
     0.02515050142850217,
     0.41442888092403163,
     0.7314074679729372,
     0.7813740002759628,
     0.367285915131369
   ]
   
   ```
   
   **Note**: This is an early stage PR, so the naming scheme 
`arrow.array.<Type>Array` might change in the future.
   
   ### Future Directions
   
   1. Currently, the "old" `featherread`/`featherwrite` code is still being 
built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This 
slows down the build process and complicates the build system logic. In 
addition, these Feather functions only support reading and writing a subset of 
Feather V1 files. We should considering disabling building of this legacy code 
by default or removing it entirely. In the long term, when we have more Arrow 
types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) 
we should consider re-implementing this functionality in terms of the new APIs.
   2. We would like to start adding more numeric array classes like 
(`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.).
   3. We only added one very basic test for `arrow.array.Float64Array` in this 
pull request. We should add a lot more tests as the APIs develop to test things 
like indexing, copying, slicing, etc.
   4. We don't have any documentation for `arrow.array.Float64Array` right now. 
In general, we should start adding detailed documentation for the new APIs as 
we start to implement them.
   5. Lots more! This is just the beginning of building out the MATLAB 
Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as 
we go.
   
   ### Notes
   
   1. Creating `libmexclass` and integrating it with the Arrow code base was a 
team effort! Thank you to @sreeharihegden, @lafiona, @sgilmore10, @jhughes-mw, 
and others at @MathWorks for their help with this pull request!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to