nveloso opened a new pull request, #45470:
URL: https://github.com/apache/arrow/pull/45470
### Rationale for this change
Please check #18036.
### What changes are included in this PR?
Almost everything needed for building and testing python wheels for
musllinux.
The service `python-wheel-musllinux-test-unittests` is currently broken (see
next section) and I need to test running the `alpine-linux-verify-rc` docker
image.
### Are these changes tested?
I was able to successfully generate a musllinux wheel by running the
following:
```
docker-compose build python-wheel-musllinux-1-2
docker-compose run python-wheel-musllinux-1-2
```
I was also able to run `python-wheel-musllinux-test-imports` with no errors.
I'm not able to run `python-wheel-musllinux-test-unittests` because there
are 2 tests failing and I don't think they are related with my changes. Can you
please confirm?
The failing tests are:
- test_uwsgi_integration
- test_print_stats
I believe the root cause is the same which is related to this:
`/arrow/cpp/src/arrow/filesystem/s3fs.cc:3461: arrow::fs::FinalizeS3 was
not called even though S3 was initialized. This could lead to a segmentation
fault at exit !!! uWSGI process 3487 got Segmentation Fault !!!`
Do you have any idea of what it might be?
Here are some logs of the failed tests:
```
======================================================================================
FAILURES
======================================================================================
_______________________________________________________________________________
test_uwsgi_integration
_______________________________________________________________________________
@pytest.mark.s3
def test_uwsgi_integration():
# GH-44071: using S3FileSystem under uwsgi shouldn't lead to a crash
at shutdown
try:
subprocess.check_call(["uwsgi", "--version"])
except FileNotFoundError:
pytest.skip("uwsgi not installed on this Python")
port = find_free_port()
args = ["uwsgi", "-i", "--http", f"127.0.0.1:{port}",
"--wsgi-file", os.path.join(here, "wsgi_examples.py")]
proc = subprocess.Popen(args, stdin=subprocess.DEVNULL)
# Try to fetch URL, it should return 200 Ok...
try:
url = f"http://127.0.0.1:{port}/s3/"
start_time = time.time()
error = None
while time.time() < start_time + 5:
try:
with urlopen(url) as resp:
assert resp.status == 200
break
except OSError as e:
error = e
time.sleep(0.1)
else:
pytest.fail(f"Could not fetch {url!r}: {error}")
finally:
proc.terminate()
# ... and uwsgi should gracefully shutdown after it's been asked
above
> assert proc.wait() == 30 # UWSGI_END_CODE = 30
E AssertionError: assert -11 == 30
E + where -11 = wait()
E + where wait = <Popen: returncode: -11 args: ['uwsgi', '-i',
'--http', '127.0.0.1:49245', '...>.wait
usr/local/lib/python3.9/site-packages/pyarrow/tests/test_fs.py:2052:
AssertionError
--------------------------------------------------------------------------------
Captured stdout call
--------------------------------------------------------------------------------
2.0.28
--------------------------------------------------------------------------------
Captured stderr call
--------------------------------------------------------------------------------
*** Starting uWSGI 2.0.28 (64bit) on [Sat Feb 8 18:56:14 2025] ***
compiled with version: 13.2.1 20231014 on 31 October 2024 19:02:44
os: Linux-6.8.0-50-generic #51-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov 9
18:03:35 UTC 2024
nodename: ae5a02215122
machine: aarch64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /
detected binary path: /usr/local/bin/python3.9
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
*** WARNING: you are running uWSGI without its master process manager ***
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 127.0.0.1:49245 fd 4
spawned uWSGI http 1 (pid: 3488)
uwsgi socket 0 bound to TCP address 127.0.0.1:40033 (port auto-assigned) fd 3
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
Python version: 3.9.19 (main, Mar 20 2024, 20:45:15) [GCC 12.2.1 20220924]
--- Python VM already initialized ---
Python main interpreter initialized at 0xeff90822b840
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 72904 bytes (71 KB) for 1 cores
*** Operational MODE: single process ***
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0xeff90822b840
pid: 3487 (default app)
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
spawned uWSGI worker 1 (and the only) (pid: 3487, cores: 1)
[pid: 3487|app: 0|req: 1/1] 127.0.0.1 () {30 vars in 346 bytes} [Sat Feb 8
18:56:14 2025] GET /s3/ => generated 12 bytes in 20 msecs (HTTP/1.1 200) 1
headers in 44 bytes (1 switches on core 0)
/arrow/cpp/src/arrow/filesystem/s3fs.cc:3461: arrow::fs::FinalizeS3 was not
called even though S3 was initialized. This could lead to a segmentation fault
at exit
!!! uWSGI process 3487 got Segmentation Fault !!!
________________________________________________________________________
test_print_stats[system_memory_pool]
________________________________________________________________________
pool_factory = <cyfunction system_memory_pool at 0xe04d9b5c3ad0>
@pytest.mark.parametrize('pool_factory', supported_factories())
def test_print_stats(pool_factory):
code = f"""if 1:
import pyarrow as pa
pool = pa.{pool_factory.__name__}()
buf = pa.allocate_buffer(64, memory_pool=pool)
pool.print_stats()
"""
res = subprocess.run([sys.executable, "-c", code], check=True,
universal_newlines=True, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
if sys.platform == "linux":
# On Linux at least, all memory pools should emit statistics
> assert res.stderr.strip() != ""
E AssertionError: assert '' != ''
E + where '' = <built-in method strip of str object at
0xe04d9c3ec6f0>()
E + where <built-in method strip of str object at
0xe04d9c3ec6f0> = ''.strip
E + where '' =
CompletedProcess(args=['/usr/local/bin/python', '-c', 'if 1:\n import
pyarrow as pa\n\n pool = pa.system...= pa.allocate_buffer(64,
memory_pool=pool)\n pool.print_stats()\n '], returncode=0,
stdout='', stderr='').stderr
usr/local/lib/python3.9/site-packages/pyarrow/tests/test_memory.py:295:
AssertionError
```
There is also a lot of skipped tests (603) and I'm not sure if this is ok.
Here is the final report:
`============================================= 2 failed, 7200 passed, 603
skipped, 12 xfailed, 2 xpassed, 5 warnings in 80.21s (0:01:20)
==============================================`
### Are there any user-facing changes?
I don't think so.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]