lafiona opened a new pull request #12004: URL: https://github.com/apache/arrow/pull/12004
## Overview This pull request implements a more scalable and sustainable approach to organizing the C++ functionality that needs to be exposed to MATLAB. It adds a new singular MEX gateway function, `mexfcn('<cpp-function-name>', <cpp-function-argument-1>, ..., <cpp-function-argument-N>)`, which delegates to specific C++ functions. To make use of `mexfcn`, the directory containing `mexfcn.<mex-extension>` must be added to the MATLAB path. Advantages to this approach: - Organization - All C++ functions that are exposed to MATLAB are registered in one location - Reduce the complexity of managing and linking many MEX files and ensuring that they are added to the MATLAB path - Reduce cognitive load for adding new functions - Avoid polluting the source tree with build artifacts - Reduce build times, building a single MEX function is faster than building potentially hundreds - Reduce binary bloat caused by creating a separate MEX file for every MEX function - Enable flexibility in terms of where C++ implementation files live ## Implementation 1. One MEX function, `mexfcn`, defined in `matlab/src/cpp/arrow/matlab/mex/mex_util.h`, dispatches to individual C++ implementation files. For example, to invoke featherread functionality from MATLAB, that is implemented in C++: ``` matlab >> [variables, metadata] = mexfcn('featherread', 'featherfile.feather'); ``` 2. Functionality implemented in C++ that we want to expose in MATLAB is registered in a function map in the file `matlab/src/cpp/arrow/matlab/mex/mex_functions.h`. 3. Restructured source tree layout and performed general code clean up in preparation for feature implementation work: 3.1. Split source code to matlab/src/matlab and matlab/src/cpp 3.2 Make packages, namespaces, and directories consistent in terms of naming and hierarchy for simplifying navigation and header inclusion. 3.3 Renamed the MATLAB package name `mlarrow` to `arrow` as the `ml` is superfluous. 4. Refactored `matlab/CMakeLists.txt`: 4.1. Build shared library, `arrow_matlab`, that contains C++ functionality for the interface. 4.2. macOS: explicitly add the path of `arrow` to the `rpath` of `arrow_matlab`, as paths of libraries output by imported targets are not automatically included. 4.3. Windows: - Copy shared libraries to the location of the MEX function, `mexfcn`, including `gtest.dll` and `gtest_main.dll` when building C++ tests. - Add the path to MATLAB to the ctest `ENVIRONMENT` when building tests. - Specify the release version of MSVC Runtime Libraries for all targets created in the CMake file. ## Testing Qualified `CMakeLists.txt` changes by building and running all tests: - On Windows 10 (Ninja and Visual Studio), macOS 11.5 (Make and Ninja), and Debian 10 (Make and Ninja) - Configurations: build both Arrow and GTest, use provided `ARROW_HOME`, use provided `GTEST_ROOT`, use both `ARROW_HOME` and `GTEST_ROOT`. ## Future Directions 1. Currently, users on macOS can use Arrow and GTest binaries that were built independently, but they cannot relocate the Arrow and GTest binaries built via the MATLAB Interface build system and reuse them. On macOS, shared libraries contain the paths of linked libraries, therefore, relocating them invalidates the libraries' `rpath` entries. On Windows and Linux, a user can: - Build Arrow and GTest binaries in the MATLAB Interface build system - Move those binaries to a separate location outside of the build tree - Use those binaries for subsequent builds, pointing to them using `ARROW_HOME` and `GTEST_ROOT` 2. Investigate why the default CMake behavior does not link the test executables against the correct MSVC Runtime libraries (ie. `ucrtbase.dll` versus `ucrtbased.dll`) when building with Ninja on Windows. 3. Add support for specifying function names and arguments as MATLAB strings to `mexfcn`. Currently, only character vectors are supported. 4. Refactor `mexfcn` to use [MATLAB Data Arrays](https://uk.mathworks.com/help/matlab/matlab-data-array.html) (MDAs) and [C++ Mex](https://uk.mathworks.com/help/matlab/cpp-mex-file-applications.html). 5. For multi-configuration generators, the binaries are built into a subdirectory indicating the build configuration. `CMakeLists.txt` currently creates a text file, `is_multi_config.txt`, that lists the configuration used. The next step is to utilize this configuration text file in the MATLAB tests to ensure they are able to find the MEX file. ## Notes 1. Thank you for all of your help on this pull request, @sgilmore10 and @kevingurney! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org