[
https://issues.apache.org/jira/browse/ARROW-12585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335105#comment-17335105
]
Rodrigo Tobar commented on ARROW-12585:
---------------------------------------
[~apitrou], [~kou] thanks for your comments. I was afraid this might not have
been something you may support, but I still wanted to report it for future
reference.
>From the scenarios above:
1. I wanted to avoid compiling anything if possible. However I *did* try
exactly what you sugest, only to find that pyarrow cannot be built because
arrow-python cannot be found. This was also the case when trying to build
pyarrow from the original git sources. I saw pyarrow has a number of toggles to
pass down cmake options and other compilation flags, but I didn't play with all
of them to be honest. In case it helps, this is the error:
{code}
-- Could NOT find ArrowPython (missing: ArrowPython_DIR)
CMake Error at
/usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message):
Could NOT find ArrowPython (missing: ARROW_PYTHON_INCLUDE_DIR
ARROW_PYTHON_LIB_DIR) (found version "4.0.0")
{code}
You can duplicate this using the image posted in this ticket, starting a
container from it with a bash terminal. Uninstall {{pyarrow}}, install
{{libarrow-python400}} and try to install {{pyarrow}} from source (will need to
{{apt install git}} as well).
2. I'm not sure how this would work with a cmake-based project? I guess I'd
have to stop using arrow's export cmake config.
3. I always avoid conda when possible, but if this time it could be an option.
I tried this out and I can build my library and load it together with pyarrow.
> Published apt packages incompatible with pip binary wheels
> ----------------------------------------------------------
>
> Key: ARROW-12585
> URL: https://issues.apache.org/jira/browse/ARROW-12585
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Packaging, Python
> Affects Versions: 4.0.0
> Reporter: Rodrigo Tobar
> Priority: Major
> Attachments: example.tar.gz
>
>
> We have a shared library that uses the shared {{libarrow}} and {{libplasma}}
> plasma libraries. Our shared library is then eventually loaded by a python
> process where we use also {{pyarrow}}. To avoid compilation of arrow/plasma
> we are installing the {{libarrow-dev}} and {{libplasma-dev}} apt packages (as
> per the official [instructions|https://arrow.apache.org/install/]) and the
> binary wheel of {{pyarrow}}.
> Each method brings its own copy of {{libarrow.so.400}}, and it turns out the
> two libraries are not equal: the library contained within {{pyarrow}} is
> compiled most probably with an older gcc version than that installed via apt,
> which is compiled using the newer CXX11 ABI from stdlibc++. This wouldn't
> have any visible effects, except that {{std::string}} is used (and maybe more
> affected types) in some arrow API points. The difference in the ABI used to
> compile {{libarrow.so.400}} eventually means they contain differently named
> symbols.
> Back to our shared library, we load it in a python process. When this
> happens, and if the {{pyarrow}} has already been imported, then *its* copy of
> {{libarrow.so.400}} is already in memory, and loading our shared library
> doesn't load the "apt" copy of {{libarrow.so.400}}. This means our library
> doesn't trigger the loading of the copy of {{libarrow.so.400}} that it was
> compiled against, and if our library refers to one of the symbols that has
> changed name then it fails to load due to this missing symbol.
> I've attached a fairly minimal example: a Dockerfile prepares a system with
> libarrow-dev from apt and a binary pyarrow wheel from PyPI. It then compiles
> a shared library against libarrow-dev. The command ran by default by the
> container is a small test that runs python and loads the example shared
> library, both with and without loading pyarrow first. When pyarrow is loaded
> first then a missing symbol error happens and the shared library fails to
> load.
> I've experienced this in an Ubuntu-based linux distro and against Arrow
> 4.0.0, but I'd assume this happens in other distros and versions.
> The workaround we are using at the moment is simple: we are installing a
> pyarrow version that is different from the arrow version installed via apt.
> We are lucky we can run in this mixed-version, multiple-libraries-loaded
> scenario, but it might not be for everyone.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)