[ 
https://issues.apache.org/jira/browse/ARROW-12585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335105#comment-17335105
 ] 

Rodrigo Tobar commented on ARROW-12585:
---------------------------------------

[~apitrou], [~kou] thanks for your comments. I was afraid this might not have 
been something you may support, but I still wanted to report it for future 
reference.

>From the scenarios above:
 1. I wanted to avoid compiling anything if possible. However I *did* try 
exactly what you sugest, only to find that pyarrow cannot be built because 
arrow-python cannot be found. This was also the case when trying to build 
pyarrow from the original git sources. I saw pyarrow has a number of toggles to 
pass down cmake options and other compilation flags, but I didn't play with all 
of them to be honest. In case it helps, this is the error:

{code}
  -- Could NOT find ArrowPython (missing: ArrowPython_DIR)
  CMake Error at 
/usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message):
    Could NOT find ArrowPython (missing: ARROW_PYTHON_INCLUDE_DIR
    ARROW_PYTHON_LIB_DIR) (found version "4.0.0")
{code}

You can duplicate this using the image posted in this ticket, starting a 
container from it with a bash terminal. Uninstall {{pyarrow}}, install 
{{libarrow-python400}} and try to install {{pyarrow}} from source (will need to 
{{apt install git}} as well).

2. I'm not sure how this would work with a cmake-based project? I guess I'd 
have to stop using arrow's export cmake config.

3. I always avoid conda when possible, but if this time it could be an option. 
I tried this out and I can build my library and load it together with pyarrow.

> Published apt packages incompatible with pip binary wheels
> ----------------------------------------------------------
>
>                 Key: ARROW-12585
>                 URL: https://issues.apache.org/jira/browse/ARROW-12585
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Packaging, Python
>    Affects Versions: 4.0.0
>            Reporter: Rodrigo Tobar
>            Priority: Major
>         Attachments: example.tar.gz
>
>
> We have a shared library that uses the shared {{libarrow}} and {{libplasma}} 
> plasma libraries. Our shared library is then eventually loaded by a python 
> process where we use also {{pyarrow}}. To avoid compilation of arrow/plasma 
> we are installing the {{libarrow-dev}} and {{libplasma-dev}} apt packages (as 
> per the official [instructions|https://arrow.apache.org/install/]) and the 
> binary wheel of {{pyarrow}}.
> Each method brings its own copy of {{libarrow.so.400}}, and it turns out the 
> two libraries are not equal: the library contained within {{pyarrow}} is 
> compiled most probably with an older gcc version than that installed via apt, 
> which is compiled using the newer CXX11 ABI from stdlibc++. This wouldn't 
> have any visible effects, except that {{std::string}} is used (and maybe more 
> affected types) in some arrow API points. The difference in the ABI used to 
> compile {{libarrow.so.400}} eventually means they contain differently named 
> symbols. 
> Back to our shared library, we load it in a python process. When this 
> happens, and if the {{pyarrow}} has already been imported, then *its* copy of 
> {{libarrow.so.400}} is already in memory, and loading our shared library 
> doesn't load the "apt" copy of {{libarrow.so.400}}. This means our library 
> doesn't trigger the loading of the copy of {{libarrow.so.400}} that it was 
> compiled against, and if our library refers to one of the symbols that has 
> changed name then it fails to load due to this missing symbol.
> I've attached a fairly minimal example: a Dockerfile prepares a system with 
> libarrow-dev from apt and a binary pyarrow wheel from PyPI. It then compiles 
> a shared library against libarrow-dev. The command ran by default by the 
> container is a small test that runs python and loads the example shared 
> library, both with and without loading pyarrow first. When pyarrow is loaded 
> first then a missing symbol error happens and the shared library fails to 
> load.
> I've experienced this in an Ubuntu-based linux distro and against Arrow 
> 4.0.0, but I'd assume this happens in other distros and versions.
> The workaround we are using at the moment is simple: we are installing a 
> pyarrow version that is different from the arrow version installed via apt. 
> We are lucky we can run in this mixed-version, multiple-libraries-loaded 
> scenario, but it might not be for everyone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to