kevingurney commented on code in PR #34563: URL: https://github.com/apache/arrow/pull/34563#discussion_r1167214786
########## matlab/tools/cmake/BuildMatlabArrowInterface.cmake: ########## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# ------- +# Config +# ------- + +# Build configuration for libmexclass. +set(CUSTOM_PROXY_FACTORY_INCLUDE_DIR "${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/proxy;${CMAKE_SOURCE_DIR}/src/cpp") +set(CUSTOM_PROXY_FACTORY_SOURCES "${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/proxy/factory.cc") +set(CUSTOM_PROXY_SOURCES "${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/array/proxy/float64_array.cc") +set(CUSTOM_PROXY_INCLUDE_DIR "${CMAKE_SOURCE_DIR}/src/cpp;${ARROW_INCLUDE_DIR}") +set(CUSTOM_PROXY_LINK_LIBRARIES ${ARROW_LINK_LIB}) +# On Windows, arrow.dll must be installed regardless of +# whether Arrow_FOUND is true or false. Therefore, we explicitly +# copy ARROW_SHARED_LIB to the installation folder +libmexclass/+proxy. +set(CUSTOM_PROXY_RUNTIME_LIBRARIES ${ARROW_SHARED_LIB}) +set(CUSTOM_PROXY_FACTORY_HEADER_FILENAME "factory.h") +set(CUSTOM_PROXY_FACTORY_CLASS_NAME "arrow::matlab::proxy::Factory") Review Comment: **Update**: We think we've identified the issue! --- ## Overview After a lot of debugging and research, we realized that when the Arrow C++ libraries (i.e. `arrow.dll`) are built in `Debug` mode, they are linked against a `Debug` version of the [MSVC Runtime](https://learn.microsoft.com/en-us/cpp/c-runtime-library/crt-library-features) (i.e. `VCRUNTIME140D.dll` - note the **D** for **D**ebug in the DLL name). At the same time, `mexclass.dll`, `arrowproxy.dll`, and `gateway.mexw64` are all built by default against a non-`Debug` version of the MSVC Runtime (i.e. `VCRUNTIME140.dll` - note the lack of **D** in the DLL name). --- ### Checking the MSVC Runtime using `dumpbin` We were able to verify the different MSVC Runtime versions in use by the various binaries by running `dumpbin` in an instance of `x64 Native Tools Command Prompt for VS 2019`: ```shell $ dumpbin /dependents arrow.dll Microsoft (R) COFF/PE Dumper Version 14.29.30137.0 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file arrow.dll File Type: DLL Image has the following dependencies: KERNEL32.dll SHELL32.dll ole32.dll MSVCP140D.dll VCRUNTIME140D.dll VCRUNTIME140_1D.dll ucrtbased.dll Summary 1000 .00cfg 33000 .data 6000 .idata 142000 .pdata 606000 .rdata 29000 .reloc 1000 .rsrc 117D000 .text 1000 .tls ``` Note the presence of `VCRUNTIME140D.dll` in the `dumpbin` output for `arrow.dll` above. --- ### Incompatibility of Multiple Different MSVC Runtimes Unfortunately, there are no guarantees that one version of the MSVC Runtime will be compatible with another. This limitation is documented in the MSVC documentation page ["C runtime (CRT) and C++ standard library (STL) `.lib` files"](https://learn.microsoft.com/en-us/cpp/c-runtime-library/crt-library-features?view=msvc-170#what-problems-exist-if-an-application-uses-more-than-one-crt-version). Included below is a relevant excerpt from the documentation: > A single process may load multiple EXE and DLL images, each with its own CRT. Each of those CRTs may use a different allocator, may have different internal structure layouts, and may use different storage arrangements. It means allocated memory, CRT resources, or classes passed across a DLL boundary can cause problems in memory management, internal static usage, or layout interpretation. For example, if a class is allocated in one DLL but passed to and deleted by another, which CRT deallocator is used? The errors caused can range from the subtle to the immediately fatal, and therefore direct transfer of such resources is discouraged. It turns out that the case being described above is exactly what was happening when calling `arrow->ToString()` (defined in `arrow.dll`) from `Float64Array::Print` (defined in `arrowproxy.dll`). Calling `ToString()` on an `arrow::Array` instance crosses over into `arrow.dll` (which uses a completely separate MSVC Runtime from `arrowproxy.dll` when built in `Debug` mode). When `ToString()` allocates memory for a `std::string` within the heap space of the MSVC Runtime used by `arrow.dll` and then returns that `std::string` to `arrowproxy.dll`, undefined behavior is invoked, which eventually leads to the memory corruption we observed. --- ### Fixing the Issue To fix the memory corruption, we modified `matlab/tools/cmake/BuildMatlabArrowInterface.cmake` so that all of the client `libmexclass` binaries (i.e. `mexclass.dll`, `arrowproxy.dll`, and `gateway.mexw64`) link against a `Debug` version of the MSVC Runtime - just like `arrow.dll` does. We were able to accomplish this by [setting the `MSVC_RUNTIME_LIBRARY` target property to `"MultiThreadedDebugDLL"` (i.e. `/MDd`)](https://cmake.org/cmake/help/latest/prop_tgt/MSVC_RUNTIME_LIBRARY.html) to match the `Debug` build of `arrow.dll`. This excerpt from the [MSVC documentation](https://learn.microsoft.com/en-us/cpp/c-runtime-library/crt-library-features?view=msvc-170#what-problems-exist-if-an-application-uses-more-than-one-crt-version) alludes to this fix: > It's also possible to avoid some of these issues if all of the images in your process use the same dynamically loaded version of the CRT. To ensure that all components use the same DLL version of the CRT, build them by using the /MD option, and use the same compiler toolset and property settings. The MSVC documentation also provides more information about the potential issues that can arise when passing resources between DLLs in the page ["Potential errors passing CRT objects across DLL boundaries"](https://learn.microsoft.com/en-us/cpp/c-runtime-library/potential-errors-passing-crt-objects-across-dll-boundaries). The [common advice we've seen around the Internet](https://stackoverflow.com/questions/22797418/how-do-i-safely-pass-objects-especially-stl-objects-to-and-from-a-dll) relating to this topic is that there is no straightforward, "safe" way to export APIs from a DLL which return STL objects (e.g. `std::string` returned by `arrow::Array::ToString`). My guess is that other community members have already thought through these limitations of the Arrow C++ APIs which return STL objects before. However, I think it may be worth adding a note / warning to the ["Debug Builds" section of the "Building on Windows" documentation for the Arrow C++ libraries](https://arrow.apache.org/docs/developers/cpp/windows.html#debug-builds) about these potential MSVC Runtime compatibility issues so that others hopefully won't have to spend as much effort debugging issues like this in the future. *We would be happy to follow up with a separate pull request to enhance the documentation with this information!* --- ### Open Questions Although we have identified the source of the memory corruption, there is still an open question about whether it is "safe" to build `mexclass.dll`, `arrowproxy.dll`, and `gateway.mexw64` against `VCRUNTIME140D.dll`. MEX functions are eventually loaded into the MATLAB process address space when called, so it isn't immediately clear whether this will interact as expected in all cases with the MSVC Runtime being used by MATLAB (i.e. `matlab.exe`). To get a better understanding of the potential implications here, it would make sense for us to consult with some other colleagues at MathWorks who have more domain expertise in this area. It may also make sense to consider modifying the [`CMakeLists.txt` for `mathworks/libmexclass`](https://github.com/mathworks/libmexclass/blob/main/libmexclass/cpp/CMakeLists.txt) to automatically link against a `Debug` MSVC Runtime when building a client project in `Debug` mode. If this isn't desirable as a default behavior, it may still be worth ading a note about potential MSVC Runtime compatibility issues to the `libmexclass` documentation. --- ### Moving Forward Putting the open questions about the safety of linking against a `Debug` MSVC Runtime in a MEX function to the side, it seems reasonable to treat this as a non-pressing issue since the vast majority of MATLAB users won't be building the MATLAB Interface to Arrow against a `Debug` build of `arrow.dll`. In addition, the fix we have identified (i.e. build the client `libmexclass` libraries with `/MDd`) appears to be at least a partial workaround for community members who do want to use a `Debug` build of `arrow.dll`. We apologize again for taking some time to get to the bottom of this. It was quite surprising that this is what ended up being the source of the memory corruption. In order to get to the root cause, we first needed to narrow down the reproduction steps and get a better understanding of the differences between `Release` and `Debug` builds. We wanted to ensure this memory corruption wasn't indicative of a deeper issue in the implementation of `libmexclass` before building more code on top of it within the upstream `apache/arrow` code base. Thank you to everyone in the community for your patience while debugging this! We should now be able to concentrate more of our energy on pushing the MATLAB interface forward since this doesn't seem to be a blocking issue. Please don't hesitate to let us know if you have questions about any of this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
