hi folks, I that the arrow-cpp conda packages for Windows have ballooned in size to nearly 140 megabytes for RC4
https://bintray.com/apache/arrow/python-rc/0.13.0-rc4#files/python-rc/0.13.0-rc4 Looking at one of these packages it seems the Windows static libraries are huge -- I'm not sure why they are so big but we should probably investigate $ ll Library/lib/ total 741796 -rw-r--r-- 1 wesm wesm 1507048 Mar 27 23:34 arrow.lib -rw-r--r-- 1 wesm wesm 76184 Mar 27 23:35 arrow_python.lib -rw-r--r-- 1 wesm wesm 61322082 Mar 27 23:36 arrow_python_static.lib -rw-r--r-- 1 wesm wesm 328090044 Mar 27 23:37 arrow_static.lib drwxr-xr-x 3 wesm wesm 4096 Apr 2 19:12 cmake/ -rw-r--r-- 1 wesm wesm 302496 Mar 27 23:38 gandiva.lib -rw-r--r-- 1 wesm wesm 239314018 Mar 27 23:40 gandiva_static.lib -rw-r--r-- 1 wesm wesm 491292 Mar 27 23:41 parquet.lib -rw-r--r-- 1 wesm wesm 128473780 Mar 27 23:42 parquet_static.lib drwxr-xr-x 2 wesm wesm 4096 Apr 2 19:12 pkgconfig/ As a mitigating measure in the meantime, I would suggest that we stop bundling the static libraries in the arrow-cpp conda package, since we're just hurting release managers and users with a large package download when they `conda install pyarrow`. Can someone open a JIRA issue about this? If packaging the static libraries in conda is something that people need then we could create a separate arrow-cpp-static artifact The production packages in conda-forge are a bit smaller (less than 100 MB), but still quite large. https://anaconda.org/conda-forge/arrow-cpp/files I noticed also that the wheel Python packages on Linux have become quite large. The Python 3.7 wheel is 48.5 megabytes for example. The expected culprit is libgandiva.so, where I see -rwxr-xr-x 1 wesm wesm 131047 Apr 2 19:18 libarrow_boost_filesystem.so* -rwxr-xr-x 1 wesm wesm 131047 Apr 2 19:18 libarrow_boost_filesystem.so.1.66.0* -rwxr-xr-x 1 wesm wesm 1253641 Apr 2 19:18 libarrow_boost_regex.so* -rwxr-xr-x 1 wesm wesm 1253641 Apr 2 19:18 libarrow_boost_regex.so.1.66.0* -rwxr-xr-x 1 wesm wesm 30081 Apr 2 19:18 libarrow_boost_system.so* -rwxr-xr-x 1 wesm wesm 30081 Apr 2 19:18 libarrow_boost_system.so.1.66.0* -rwxr-xr-x 1 wesm wesm 1613712 Apr 2 19:18 libarrow_python.so* -rwxr-xr-x 1 wesm wesm 1400561 Apr 2 19:18 libarrow_python.so.13* -rwxr-xr-x 1 wesm wesm 12543416 Apr 2 19:18 libarrow.so* -rwxr-xr-x 1 wesm wesm 11540172 Apr 2 19:18 libarrow.so.13* -rw-r--r-- 1 wesm wesm 6393593 Apr 2 19:18 lib.cpp -rwxr-xr-x 1 wesm wesm 2558504 Apr 2 19:18 lib.cpython-37m-x86_64-linux-gnu.so* -rwxr-xr-x 1 wesm wesm 61260912 Apr 2 19:18 libgandiva.so* -rwxr-xr-x 1 wesm wesm 57342916 Apr 2 19:18 libgandiva.so.13* -rwxr-xr-x 1 wesm wesm 3567224 Apr 2 19:18 libparquet.so* -rwxr-xr-x 1 wesm wesm 3035367 Apr 2 19:18 libparquet.so.13* -rwxr-xr-x 1 wesm wesm 352440 Apr 2 19:18 libplasma.so* -rwxr-xr-x 1 wesm wesm 315802 Apr 2 19:18 libplasma.so.13* There's something very odd here, though, which is that libgandiva.so and libgandiva.so.13 appear to be distinct. They have different checksums, for example (pyarrow-0.13.0-py37-test) 19:19 ~/Downloads/arrow-cpp-py36-vc14 $ sha256sum ~/miniconda/envs/pyarrow-0.13.0-py37-test/lib/python3.7/site-packages/pyarrow/libgandiva.so 8f1026d7bf476b90a0cac8239947ad334ee91cd31a944102aff6e8a67ae973e8 /home/wesm/miniconda/envs/pyarrow-0.13.0-py37-test/lib/python3.7/site-packages/pyarrow/libgandiva.so (pyarrow-0.13.0-py37-test) 19:21 ~/Downloads/arrow-cpp-py36-vc14 $ sha256sum ~/miniconda/envs/pyarrow-0.13.0-py37-test/lib/python3.7/site-packages/pyarrow/libgandiva.so.13 9969a50787f8e0411115c0bfffccd3a350fde5f8c2f319acd72f3cf8097365dc /home/wesm/miniconda/envs/pyarrow-0.13.0-py37-test/lib/python3.7/site-packages/pyarrow/libgandiva.so.13 That seems buggy to me. We might also investigate if there's a way to trim the binary sizes in some way. Thanks Wes