[
https://issues.apache.org/jira/browse/ARROW-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031544#comment-17031544
]
Michael Marino commented on ARROW-5158:
---------------------------------------
Hi Wes, thanks for the response. Indeed, I understand the issue and that this
isn't a critical part of the immediate timeline. We currently work around
this, and so it is not yet critical for us, but, especially with AWS pushing
serverless for handling data workflows, I do expect this to become an issue for
us and for others sometime soon.
I personally have started looking at some possible solutions and will try to
submit a PR here, but I would need some guidance as to the external
requirements of the package. Given the conversation about this
[here|https://discuss.python.org/t/symbolic-links-in-wheels/1945/5], it sounds
like the libraries are packaged in such a way so as to be usable by other tools
(e.g. pyspark?). If this is *not* the case, then I would focus on trying to
update how the library is loaded from within pyarrow itself to handle the case
when the library is coming from within the wheel.
> [Packaging][Wheel] Symlink libraries in wheels
> ----------------------------------------------
>
> Key: ARROW-5158
> URL: https://issues.apache.org/jira/browse/ARROW-5158
> Project: Apache Arrow
> Issue Type: Bug
> Components: Packaging, Python
> Reporter: Krisztian Szucs
> Priority: Major
> Labels: wheel
>
> Libraries are copied instead of symlinking in linux and osx wheels, which
> result quiet big binaries:
>
> This is what the wheel contains before running auditwheel:
>
> {code}
> -rwxr-xr-x 1 root root 128K Apr 3 09:02 libarrow_boost_filesystem.so
> -rwxr-xr-x 1 root root 128K Apr 3 09:02 libarrow_boost_filesystem.so.1.66.0
> -rwxr-xr-x 1 root root 1.2M Apr 3 09:02 libarrow_boost_regex.so
> -rwxr-xr-x 1 root root 1.2M Apr 3 09:02 libarrow_boost_regex.so.1.66.0
> -rwxr-xr-x 1 root root 30K Apr 3 09:02 libarrow_boost_system.so
> -rwxr-xr-x 1 root root 30K Apr 3 09:02 libarrow_boost_system.so.1.66.0
> -rwxr-xr-x 1 root root 1.4M Apr 3 09:02 libarrow_python.so
> -rwxr-xr-x 1 root root 1.4M Apr 3 09:02 libarrow_python.so.14
> -rwxr-xr-x 1 root root 12M Apr 3 09:02 libarrow.so
> -rwxr-xr-x 1 root root 12M Apr 3 09:02 libarrow.so.14
> -rw-r--r-- 1 root root 6.1M Apr 3 09:02 lib.cpp
> -rwxr-xr-x 1 root root 2.4M Apr 3 09:02
> [lib.cpython-36m-x86_64-linux-gnu.so|http://lib.cpython-36m-x86_64-linux-gnu.so/]
> -rwxr-xr-x 1 root root 55M Apr 3 09:02 libgandiva.so
> -rwxr-xr-x 1 root root 55M Apr 3 09:02 libgandiva.so.14
> -rwxr-xr-x 1 root root 2.9M Apr 3 09:02 libparquet.so
> -rwxr-xr-x 1 root root 2.9M Apr 3 09:02 libparquet.so.14
> -rwxr-xr-x 1 root root 309K Apr 3 09:02 libplasma.so
> -rwxr-xr-x 1 root root 309K Apr 3 09:02 libplasma.so.14
> {code}
> After running auditwheel, the repaired wheel contains:
>
> {code}
> -rwxr-xr-x 1 root root 128K Apr 3 09:02 libarrow_boost_filesystem.so
> -rwxr-xr-x 1 root root 128K Apr 3 09:02 libarrow_boost_filesystem.so.1.66.0
> -rwxr-xr-x 1 root root 1.2M Apr 3 09:02 libarrow_boost_regex.so
> -rwxr-xr-x 1 root root 1.2M Apr 3 09:02 libarrow_boost_regex.so.1.66.0
> -rwxr-xr-x 1 root root 30K Apr 3 09:02 libarrow_boost_system.so
> -rwxr-xr-x 1 root root 30K Apr 3 09:02 libarrow_boost_system.so.1.66.0
> -rwxr-xr-x 1 root root 1.6M Apr 3 09:55 libarrow_python.so
> -rwxr-xr-x 1 root root 1.4M Apr 3 09:02 libarrow_python.so.14
> -rwxr-xr-x 1 root root 12M Apr 3 09:55 libarrow.so
> -rwxr-xr-x 1 root root 12M Apr 3 09:02 libarrow.so.14
> -rw-r--r-- 1 root root 6.1M Apr 3 09:02 lib.cpp
> -rwxr-xr-x 1 root root 2.5M Apr 3 09:55
> [lib.cpython-36m-x86_64-linux-gnu.so|http://lib.cpython-36m-x86_64-linux-gnu.so/]
> -rwxr-xr-x 1 root root 59M Apr 3 09:55 libgandiva.so
> -rwxr-xr-x 1 root root 55M Apr 3 09:02 libgandiva.so.14
> -rwxr-xr-x 1 root root 3.5M Apr 3 09:55 libparquet.so
> -rwxr-xr-x 1 root root 2.9M Apr 3 09:02 libparquet.so.14
> -rwxr-xr-x 1 root root 345K Apr 3 09:55 libplasma.so
> -rwxr-xr-x 1 root root 309K Apr 3 09:02 libplasma.so.14
> {code}
>
> Here is the output of auditwheel
> [https://travis-ci.org/kszucs/crossbow/builds/514605723#L3340]
> They should be symlinks, we have special code for this:
> https://github.com/apache/arrow/blob/4495305092411e8551c60341e273c8aa3c14b282/python/setup.py#L489-L499
> This is probably not going into the wheel as wheels are zip-files and they
> don't support symlinks by default. So we probably need to pass the
> `--symlinks` parameter to the wheel code.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)