The only magic that auditwheel does on the Linux package is that it pulls in 
our shared version of libz.so into the wheel, otherwise there should be no 
differences in the wheel contents.

Uwe

On Wed, Apr 3, 2019, at 12:06 PM, Krisztián Szűcs wrote:
> This is what the wheel contains before running auditwheel:
> 
> -rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so
> -rwxr-xr-x  1 root root 128K Apr  3 09:02
> libarrow_boost_filesystem.so.1.66.0
> -rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so
> -rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so.1.66.0
> -rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so
> -rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so.1.66.0
> -rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so
> -rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so.14
> -rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so
> -rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so.14
> -rw-r--r--  1 root root 6.1M Apr  3 09:02 lib.cpp
> -rwxr-xr-x  1 root root 2.4M Apr  3 09:02
> lib.cpython-36m-x86_64-linux-gnu.so
> -rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so
> -rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so.14
> -rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so
> -rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so.14
> -rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so
> -rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so.14
> 
> After running auditwheel, the repaired wheel contains:
> 
> -rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so
> -rwxr-xr-x  1 root root 128K Apr  3 09:02
> libarrow_boost_filesystem.so.1.66.0
> -rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so
> -rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so.1.66.0
> -rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so
> -rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so.1.66.0
> -rwxr-xr-x  1 root root 1.6M Apr  3 09:55 libarrow_python.so
> -rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so.14
> -rwxr-xr-x  1 root root  12M Apr  3 09:55 libarrow.so
> -rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so.14
> -rw-r--r--  1 root root 6.1M Apr  3 09:02 lib.cpp
> -rwxr-xr-x  1 root root 2.5M Apr  3 09:55
> lib.cpython-36m-x86_64-linux-gnu.so
> -rwxr-xr-x  1 root root  59M Apr  3 09:55 libgandiva.so
> -rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so.14
> -rwxr-xr-x  1 root root 3.5M Apr  3 09:55 libparquet.so
> -rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so.14
> -rwxr-xr-x  1 root root 345K Apr  3 09:55 libplasma.so
> -rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so.14
> 
> Here is the output of auditwheel
> https://travis-ci.org/kszucs/crossbow/builds/514605723#L3340
> 
> On Wed, Apr 3, 2019 at 10:36 AM Antoine Pitrou <anto...@python.org> wrote:
> 
> >
> > Le 03/04/2019 à 02:23, Wes McKinney a écrit :
> > >
> > > $ ll Library/lib/
> > > total 741796
> > > -rw-r--r-- 1 wesm wesm   1507048 Mar 27 23:34 arrow.lib
> > > -rw-r--r-- 1 wesm wesm     76184 Mar 27 23:35 arrow_python.lib
> > > -rw-r--r-- 1 wesm wesm  61322082 Mar 27 23:36 arrow_python_static.lib
> > > -rw-r--r-- 1 wesm wesm 328090044 Mar 27 23:37 arrow_static.lib
> > > drwxr-xr-x 3 wesm wesm      4096 Apr  2 19:12 cmake/
> > > -rw-r--r-- 1 wesm wesm    302496 Mar 27 23:38 gandiva.lib
> > > -rw-r--r-- 1 wesm wesm 239314018 Mar 27 23:40 gandiva_static.lib
> > > -rw-r--r-- 1 wesm wesm    491292 Mar 27 23:41 parquet.lib
> > > -rw-r--r-- 1 wesm wesm 128473780 Mar 27 23:42 parquet_static.lib
> > > drwxr-xr-x 2 wesm wesm      4096 Apr  2 19:12 pkgconfig/
> > >
> > > As a mitigating measure in the meantime, I would suggest that we stop
> > > bundling the static libraries in the arrow-cpp conda package, since
> > > we're just hurting release managers and users with a large package
> > > download when they `conda install pyarrow`.
> >
> > Agreed.
> >
> > > Can someone open a JIRA
> > > issue about this?
> >
> > See https://issues.apache.org/jira/browse/ARROW-5101
> >
> > > There's something very odd here, though, which is that libgandiva.so
> > > and libgandiva.so.13 appear to be distinct.
> >
> > Not only.  libparquet.so, libplasma.so and libarrow.so are distinct as
> > well.  This means that we may be building those libraries twice instead
> > of copying the files.
> >
> > By the way, I don't understand why those are not symlinks.
> >
> Me neither, but I guess setup.py bdist_wheel doesn't support symlinks.
> 
> >
> > > That seems buggy to me. We might also investigate if there's a way to
> > > trim the binary sizes in some way.
> >
> > Well, there's always "strip -s", but it doesn't seem to remove much
> > (libgandiva.so shrinks from 60 to 50 MB, and you lose all debug
> > information).
> >
> > One issue seems to be that libgandiva.so links LLVM statically, but
> > doesn't hide LLVM symbols.  That said, libllvmlite.so (which hides LLVM
> > symbols) has grown quite large recently as well (around 40 MB).
> >
> > Perhaps Gandiva needs to be packaged separately...
> >
> > Regards
> >
> > Antoine.
> >
>

Reply via email to