There's perhaps a simpler way for MATLAB to solve this:
1. link Arrow statically inside your own libmatlab_arrow.dll
2. have libmatlab_arrow.dll expose its own API corresponding to MATLAB needs
3. arrange for Arrow symbols to not be exposed by libmatlab_arrow.dll
I suppose it depends whether the API needed by MATLAB is small or large.
Regards
Antoine.
Le 10/03/2021 à 04:36, Tahsin Hassan a écrit :
Hi Wes,
Thanks for providing the feedback.
1. Regarding the symbol versioning change
As a next step, should I open up a Jira issue and assign myself to see whether
I can automate adding Macros
BEGIN_ARROW_NS/ END_ARROW_NS ? using LLVM clang AST matchers.
We can submit this as a one-time mechanical change like you mentioned.
However, “setting up a coding guideline of ensuring `arrow/config.h` by all
headers” - - should this also be part of the same issue?
Is there some other buy-in process for such guidelines?
1. There is another issue, of library versioning, which comes even before
symbol version issue.
Since, right now, both pyarrow and MATLAB ships a dll that is aptly named
arrow.dll on windows, if pyarrow/python brings in the arrow.dll, the MATLAB
process will not load our shipped arrow.dll to begin with due to RTLD_GLOBAL.
We can handle the library versioning issue, by changing some CMake
Infrastructure, by giving user ability for specifying their own
SOVERSION - in linux/mac.
VERSION/OUTPUT_NAME - in windows (NOT sure yet, just a hunch for windows)
Currently, these cmake variables are located here :
https://github.com/apache/arrow/blob/0f72bcfc67bb4db72c27f9c3282fe5020490f214/cpp/cmake_modules/BuildUtils.cmake#L369
Once again, I would be happy to create another Jira issue and see whether I
could work through the change.
Regards,
Tahsin
From: Wes McKinney <wesmck...@gmail.com>
Date: Sunday, March 7, 2021 at 4:12 PM
To: dev <dev@arrow.apache.org>
Subject: Re: [C++] libarrow isolation
I took a look at the document. Basically you want to have two
different versions of the Arrow shared library loaded into the same
process, with some code linked to one library and some code linked to
another. This is very similar to the problem that Boost addresses with
the `bcp --namespace=$MY_PRIVATE_BOOST_NAMESPACE ...` operation.
To obtain strict symbol isolation you have to change the "arrow::"
namespace in the C++ libraries you are shipping in your application.
AFAIK, this is a bit of a nuisance to do. One way to achieve it is to
replace every use of
namespace arrow {
...
}
with
BEGIN_ARROW_NS
...
END_ARROW_NS
or similar. Then you would do something like `cmake
-DARROW_NAMESPACE=mwarrow ...` when configuring your build.
This change could certainly be implemented in the Arrow library, which
is a one-time mechanical operation but will create some ongoing
non-intuitiveness anytime new files are created. All headers must also
therefore depend on a central `arrow/config.h` which contains the
needed namespace macros.
On Fri, Mar 5, 2021 at 9:19 AM Antoine Pitrou <anto...@python.org> wrote:
Hi Tashin,
Sorry for the lack of response. Unfortunatly I feel a bit out of my
depth on linker issues. I hope someone else can give advice.
Regards
Antoine.
Le 05/03/2021 à 16:09, Tahsin Hassan a écrit :
Hi,
I was wondering whether folks had a chance to look over the material and had
any pointers for the proposed approach.
If I should post in some other format or clarify something, please let me know.
In the meantime, I will try out the steps we propose.
Regards,
Tahsin
From: Tahsin Hassan <thas...@mathworks.com>
Date: Thursday, February 25, 2021 at 11:43 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] libarrow isolation
Hi Antoine,
I struggled a bit to put all my thoughts in an email format, that will be
easily consumable.
So, I wrote up a github markdown to add some more detail to the issue, we are
facing.
Could you take a look, and let us know your thoughts?
https://github.com/mathworks/matlab-arrow-support-files/blob/main/libarrowclash.md<https://github.com/mathworks/matlab-arrow-support-files/blob/main/libarrowclash.md>
Regards,
Tahsin
From: Antoine Pitrou <anto...@python.org>
Date: Tuesday, February 23, 2021 at 1:21 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] libarrow isolation
Hi Tahsin,
I see. So the error happens when loading PyArrow into MATLAB, I
suppose? What kind of error do you get?
Regards
Antoine.
Le 23/02/2021 à 18:12, Tahsin Hassan a écrit :
Hi Antoine,
MATLAB is using RTLD_GLOBAL. Hope that helps in clarifying the workflow.
Regrards,
Tahsin
________________________________
From: Antoine Pitrou <anto...@python.org>
Sent: Monday, February 22, 2021 9:41 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] libarrow isolation
Le 22/02/2021 à 15:29, Tahsin Hassan a écrit :
Hi all,
MATLAB uses the Arrow C++ libraries (i.e. libarrow.so) to read and write
Parquet files (https://www.mathworks.com/help/matlab/ref/parquetread.html)
While exploring ways to integrate more tightly with Arrow, we've run into a
symbol/library naming clash issue.
When running pyarrow within the MATLAB process
(https://www.mathworks.com/help/matlab/call-python-libraries.html?s_tid=CRUX_lftnav
), the libarrow.so loaded by pyarrow clashes with the libarrow.so shipping
with MATLAB.
For the record, Python loads extension modules (such as PyArrow) with
RTLD_LOCAL. I assume MATLAB doesn't?
Regards
Antoine.