lafiona opened a new pull request #12004:
URL: https://github.com/apache/arrow/pull/12004


   ## Overview
   
   This pull request implements a more scalable and sustainable approach to 
organizing the C++ functionality that needs to be exposed to MATLAB. It adds a 
new singular MEX gateway function, `mexfcn('<cpp-function-name>', 
<cpp-function-argument-1>, ..., <cpp-function-argument-N>)`, which delegates to 
specific C++ functions. To make use of `mexfcn`, the directory containing 
`mexfcn.<mex-extension>` must be added to the MATLAB path. 
   
   Advantages to this approach:
   - Organization
       - All C++ functions that are exposed to MATLAB are registered in one 
location
       - Reduce the complexity of managing and linking many MEX files and 
ensuring that they are added to the MATLAB path
       - Reduce cognitive load for adding new functions
   - Avoid polluting the source tree with build artifacts
   - Reduce build times, building a single MEX function is faster than building 
potentially hundreds
   - Reduce binary bloat caused by creating a separate MEX file for every MEX 
function
   - Enable flexibility in terms of where C++ implementation files live
   
   ## Implementation
   
   1. One MEX function, `mexfcn`, defined in 
`matlab/src/cpp/arrow/matlab/mex/mex_util.h`, dispatches to individual C++ 
implementation files. 
   
   For example, to invoke featherread functionality from MATLAB, that is 
implemented in C++:
   
   ``` matlab
   >> [variables, metadata] = mexfcn('featherread', 'featherfile.feather');
   ```
   
   2. Functionality implemented in C++ that we want to expose in MATLAB is 
registered in a function map in the file 
`matlab/src/cpp/arrow/matlab/mex/mex_functions.h`.
   3. Restructured source tree layout and performed general code clean up in 
preparation for feature implementation work:
     3.1. Split source code to matlab/src/matlab and matlab/src/cpp
     3.2 Make packages, namespaces, and directories consistent in terms of 
naming and hierarchy for simplifying navigation and header inclusion.
     3.3 Renamed the MATLAB package name `mlarrow` to `arrow` as the `ml` is 
superfluous.
   4. Refactored `matlab/CMakeLists.txt`:
     4.1. Build shared library, `arrow_matlab`, that contains C++ functionality 
for the interface.
     4.2. macOS: explicitly add the path of `arrow` to the `rpath` of 
`arrow_matlab`, as paths of libraries output by imported targets are not 
automatically included.
     4.3. Windows:
       - Copy shared libraries to the location of the MEX function, `mexfcn`, 
including `gtest.dll` and `gtest_main.dll` when building C++ tests.
        - Add the path to MATLAB to the ctest `ENVIRONMENT` when building tests.
        - Specify the release version of MSVC Runtime Libraries for all targets 
created in the CMake file.
   
   ## Testing
   
   Qualified `CMakeLists.txt` changes by building and running all tests:
     - On Windows 10 (Ninja and Visual Studio), macOS 11.5 (Make and Ninja), 
and Debian 10 (Make and Ninja)
     - Configurations: build both Arrow and GTest, use provided `ARROW_HOME`, 
use provided `GTEST_ROOT`, use both `ARROW_HOME` and `GTEST_ROOT`.
   
   ## Future Directions
   1. Currently, users on macOS can use Arrow and GTest binaries that were 
built independently, but they cannot relocate the Arrow and GTest binaries 
built via the MATLAB Interface build system and reuse them. On macOS, shared 
libraries contain the paths of linked libraries, therefore, relocating them 
invalidates the libraries' `rpath` entries. On Windows and Linux, a user can:
       - Build Arrow and GTest binaries in the MATLAB Interface build system
       - Move those binaries to a separate location outside of the build tree
       - Use those binaries for subsequent builds, pointing to them using 
`ARROW_HOME` and `GTEST_ROOT`
   2. Investigate why the default CMake behavior does not link the test 
executables against the correct MSVC Runtime libraries (ie. `ucrtbase.dll` 
versus `ucrtbased.dll`) when building with Ninja on Windows. 
   3. Add support for specifying function names and arguments as MATLAB strings 
to `mexfcn`. Currently, only character vectors are supported. 
   4. Refactor `mexfcn` to use [MATLAB Data 
Arrays](https://uk.mathworks.com/help/matlab/matlab-data-array.html) (MDAs) and 
[C++ Mex](https://uk.mathworks.com/help/matlab/cpp-mex-file-applications.html).
   5. For multi-configuration generators, the binaries are built into a 
subdirectory indicating the build configuration. `CMakeLists.txt` currently 
creates a text file, `is_multi_config.txt`, that lists the configuration used. 
The next step is to utilize this configuration text file in the MATLAB tests to 
ensure they are able to find the MEX file. 
   
   
   ## Notes
   1. Thank you for all of your help on this pull request, @sgilmore10 and 
@kevingurney!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to