raulcd commented on code in PR #42028:
URL: https://github.com/apache/arrow/pull/42028#discussion_r1633383874
##########
ci/scripts/python_wheel_manylinux_build.sh:
##########
@@ -160,6 +160,26 @@ export CMAKE_PREFIX_PATH=/tmp/arrow-dist
pushd /arrow/python
python setup.py bdist_wheel
+echo "=== Strip symbols from wheel ==="
+mkdir dist/temp-fix-wheel
+mv dist/pyarrow-*.whl dist/temp-fix-wheel
+
+pushd dist/temp-fix-wheel
+wheel_name=$(ls pyarrow-*.whl)
+# Unzip and remove old wheel
+unzip $wheel_name
+rm $wheel_name
+for filename in $(ls pyarrow/*.so pyarrow/*.so.*); do
+ echo "Stripping debug symbols from: $filename";
+ strip --strip-debug $filename
+done
+# Zip wheel again after stripping symbols
+zip -r $wheel_name .
Review Comment:
Both wheels are compressed, a size comparison shows:
Initial one from CI from main:
```
-rw-r--r-- 1 raulcd raulcd 40M jun 6 08:46
pyarrow-17.0.0.dev260-cp310-cp310-manylinux_2_28_x86_64.whl
```
and new one generated from this PR
```
-rw-r--r-- 1 raulcd raulcd 38M jun 7 17:35
pyarrow-17.0.0.dev275-cp310-cp310-manylinux_2_28_x86_64.whl
```
The compression type for the files is exactly the same and all file sizes
are exactly the same, I've tested with the following snippet:
```
import zipfile
with
zipfile.ZipFile('pyarrow-17.0.0.dev260-cp38-cp38-manylinux_2_28_x86_64.whl',
'r') as old_wheel:
with
zipfile.ZipFile('pyarrow-17.0.0.dev275-cp38-cp38-manylinux_2_28_x86_64.whl',
'r') as new_wheel:
for info_old, info_new in zip(sorted(old_wheel.infolist(),
key=lambda x: x.filename), sorted(new_wheel.infolist(), key=lambda x:
x.filename)):
if "dev260" not in info_old.filename:
assert info_old.filename == info_new.filename,
info_old.filename + info_new.filename
assert info_old.compress_type == info_new.compress_type
if "RECORD" in info_old.filename:
continue
if not (info_old.filename.endswith('.so') or
info_old.filename.endswith('.so.1700')):
assert info_old.compress_size == info_new.compress_size,
f"old: {info_old.filename, info_old.compress_size}, new: {info_new.filename,
info_new.compress_size}"
print(f'filename: {info_old.filename} ')
print(f'compress type: {info_old.compress_type}')
print(f'compress size: {info_new.compress_size}')
```
I am not showing the full output but this a an example of what I get:
```
...
filename: pyarrow/vendored/docscrape.py
compress type: 8
compress size: 6080
filename: pyarrow/vendored/version.py
compress type: 8
compress size: 3894
filename: scripts/
compress type: 0
compress size: 0
filename: scripts/test_imports.py
compress type: 8
compress size: 496
filename: scripts/test_leak.py
compress type: 8
compress size: 1422
```
I am not an expert on zip compression but all tests for the wheels are
successful on the CI jobs and my validations seem to show the exact same level
of compression and same size for all files in both wheels.
Any other test you think I should do or how I could check?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]