There's perhaps a simpler way for MATLAB to solve this:

1. link Arrow statically inside your own libmatlab_arrow.dll
2. have libmatlab_arrow.dll expose its own API corresponding to MATLAB needs
3. arrange for Arrow symbols to not be exposed by libmatlab_arrow.dll

I suppose it depends whether the API needed by MATLAB is small or large.

Regards

Antoine.


Le 10/03/2021 à 04:36, Tahsin Hassan a écrit :
Hi Wes,

Thanks for providing the feedback.

   1.  Regarding the symbol versioning change
As a next step, should I open up a Jira issue and assign myself to see whether 
I can automate adding Macros
BEGIN_ARROW_NS/ END_ARROW_NS ? using LLVM clang AST matchers.
We can submit this as a one-time mechanical change like you mentioned.

However, “setting up a coding guideline of ensuring `arrow/config.h` by all 
headers” - - should this also be part of the same issue?
Is there some other buy-in process for such guidelines?



   1.  There is another issue, of library versioning, which comes even before 
symbol version issue.
Since, right now, both pyarrow and MATLAB ships a dll that is aptly named 
arrow.dll on windows, if pyarrow/python brings in the arrow.dll, the MATLAB 
process will not load our shipped arrow.dll to begin with due to RTLD_GLOBAL.

We can handle the library versioning issue, by changing some CMake 
Infrastructure, by giving user ability for specifying their own
SOVERSION                            - in linux/mac.
VERSION/OUTPUT_NAME   - in windows (NOT sure yet, just a hunch for windows)

Currently, these cmake variables are located here :
https://github.com/apache/arrow/blob/0f72bcfc67bb4db72c27f9c3282fe5020490f214/cpp/cmake_modules/BuildUtils.cmake#L369
Once again, I would be happy to create another Jira issue and see whether I 
could work through the change.

Regards,
Tahsin


From: Wes McKinney <wesmck...@gmail.com>
Date: Sunday, March 7, 2021 at 4:12 PM
To: dev <dev@arrow.apache.org>
Subject: Re: [C++] libarrow isolation
I took a look at the document. Basically you want to have two
different versions of the Arrow shared library loaded into the same
process, with some code linked to one library and some code linked to
another. This is very similar to the problem that Boost addresses with
the `bcp --namespace=$MY_PRIVATE_BOOST_NAMESPACE ...` operation.

To obtain strict symbol isolation you have to change the "arrow::"
namespace in the C++ libraries you are shipping in your application.
AFAIK, this is a bit of a nuisance to do. One way to achieve it is to
replace every use of

namespace arrow {
...
}

with

BEGIN_ARROW_NS
...
END_ARROW_NS

or similar. Then you would do something like `cmake
-DARROW_NAMESPACE=mwarrow ...` when configuring your build.

This change could certainly be implemented in the Arrow library, which
is a one-time mechanical operation but will create some ongoing
non-intuitiveness anytime new files are created. All headers must also
therefore depend on a central `arrow/config.h` which contains the
needed namespace macros.

On Fri, Mar 5, 2021 at 9:19 AM Antoine Pitrou <anto...@python.org> wrote:


Hi Tashin,

Sorry for the lack of response. Unfortunatly I feel a bit out of my
depth on linker issues. I hope someone else can give advice.

Regards

Antoine.


Le 05/03/2021 à 16:09, Tahsin Hassan a écrit :
Hi,

I was wondering whether folks had a chance to look over the material and had 
any pointers for the proposed approach.
If I should post in some other format or clarify something, please let me know.
In the meantime, I will try out the steps we propose.

Regards,
Tahsin


From: Tahsin Hassan <thas...@mathworks.com>
Date: Thursday, February 25, 2021 at 11:43 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] libarrow isolation
Hi Antoine,

I struggled a bit to put all my thoughts in an email format, that will be 
easily consumable.
So, I wrote up a github markdown to add some more detail to the issue, we are 
facing.

Could you take a look, and let us know your thoughts?
https://github.com/mathworks/matlab-arrow-support-files/blob/main/libarrowclash.md<https://github.com/mathworks/matlab-arrow-support-files/blob/main/libarrowclash.md>

Regards,
Tahsin

From: Antoine Pitrou <anto...@python.org>
Date: Tuesday, February 23, 2021 at 1:21 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] libarrow isolation

Hi Tahsin,

I see. So the error happens when loading PyArrow into MATLAB, I
suppose? What kind of error do you get?

Regards

Antoine.


Le 23/02/2021 à 18:12, Tahsin Hassan a écrit :
Hi Antoine,

MATLAB is using RTLD_GLOBAL. Hope that helps in clarifying the workflow.

Regrards,
Tahsin

________________________________
From: Antoine Pitrou <anto...@python.org>
Sent: Monday, February 22, 2021 9:41 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] libarrow isolation


Le 22/02/2021 à 15:29, Tahsin Hassan a écrit :
Hi all,

MATLAB uses the Arrow C++ libraries (i.e. libarrow.so) to read and write 
Parquet files (https://www.mathworks.com/help/matlab/ref/parquetread.html) 
While exploring ways to integrate more tightly with Arrow, we've run into a 
symbol/library naming clash issue.

When running pyarrow within the MATLAB process 
(https://www.mathworks.com/help/matlab/call-python-libraries.html?s_tid=CRUX_lftnav
 ), the libarrow.so loaded by pyarrow clashes with the libarrow.so shipping 
with MATLAB.

For the record, Python loads extension modules (such as PyArrow) with
RTLD_LOCAL. I assume MATLAB doesn't?

Regards

Antoine.




Reply via email to