raulcd commented on code in PR #42028:
URL: https://github.com/apache/arrow/pull/42028#discussion_r1633383874


##########
ci/scripts/python_wheel_manylinux_build.sh:
##########
@@ -160,6 +160,26 @@ export CMAKE_PREFIX_PATH=/tmp/arrow-dist
 pushd /arrow/python
 python setup.py bdist_wheel
 
+echo "=== Strip symbols from wheel ==="
+mkdir dist/temp-fix-wheel
+mv dist/pyarrow-*.whl dist/temp-fix-wheel
+
+pushd dist/temp-fix-wheel
+wheel_name=$(ls pyarrow-*.whl)
+# Unzip and remove old wheel
+unzip $wheel_name
+rm $wheel_name
+for filename in $(ls pyarrow/*.so pyarrow/*.so.*); do
+    echo "Stripping debug symbols from: $filename";
+    strip --strip-debug $filename
+done
+# Zip wheel again after stripping symbols
+zip -r $wheel_name .

Review Comment:
   Both wheels are compressed, a size comparison shows:
   Initial one from CI from main:
   ```
   -rw-r--r--  1 raulcd raulcd  40M jun  6 08:46 
pyarrow-17.0.0.dev260-cp310-cp310-manylinux_2_28_x86_64.whl
   ```
   and new one generated from this PR
   ```
   -rw-r--r--  1 raulcd raulcd  38M jun  7 17:35 
pyarrow-17.0.0.dev275-cp310-cp310-manylinux_2_28_x86_64.whl
   ```
   
   The compression type for the files is exactly the same and all file sizes 
are exactly the same, I've tested with the following snippet:
   ```
   import zipfile
   
   with 
zipfile.ZipFile('pyarrow-17.0.0.dev260-cp38-cp38-manylinux_2_28_x86_64.whl', 
'r') as old_wheel:
       with 
zipfile.ZipFile('pyarrow-17.0.0.dev275-cp38-cp38-manylinux_2_28_x86_64.whl', 
'r') as new_wheel:
           for info_old, info_new in zip(sorted(old_wheel.infolist(), 
key=lambda x: x.filename), sorted(new_wheel.infolist(), key=lambda x: 
x.filename)):
               if "dev260" not in info_old.filename:
                   assert info_old.filename == info_new.filename, 
info_old.filename + info_new.filename
               assert info_old.compress_type == info_new.compress_type
               if "RECORD" in info_old.filename:
                   continue
               if not (info_old.filename.endswith('.so') or 
info_old.filename.endswith('.so.1700')):
                   assert info_old.compress_size == info_new.compress_size, 
f"old: {info_old.filename, info_old.compress_size}, new: {info_new.filename, 
info_new.compress_size}"
               print(f'filename: {info_old.filename} ')
               print(f'compress type: {info_old.compress_type}')
               print(f'compress size: {info_new.compress_size}')
   ```
   I am not showing the full output but this a an example of what I get:
   ```
   ...
   filename: pyarrow/vendored/docscrape.py 
   compress type: 8
   compress size: 6080
   filename: pyarrow/vendored/version.py 
   compress type: 8
   compress size: 3894
   filename: scripts/ 
   compress type: 0
   compress size: 0
   filename: scripts/test_imports.py 
   compress type: 8
   compress size: 496
   filename: scripts/test_leak.py 
   compress type: 8
   compress size: 1422
   
   ```
   
   I am not an expert on zip compression but all tests for the wheels are 
successful on the CI jobs and my validations seem to show the exact same level 
of compression and same size for all files in both wheels.
   Any other test you think I should do or how I could check?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to