tcsavage commented on issue #50188:
URL: https://github.com/apache/arrow/issues/50188#issuecomment-4752545055

   I've tested one of the new wheels (specifically [this 
one](https://github.com/ursacomputing/crossbow/releases/download/actions-c6b0d8e743-github-wheel-manylinux-2-28-cp313-cp313-amd64/pyarrow-25.0.0.dev150-cp313-cp313-manylinux_2_28_x86_64.whl))
 in my harness and it still exhibits the problem (won't let Python exit).
   
   Here's some more information. Hopefully somebody else can replicate this.
   
   # Harness
   
   My harness consists of a Docker image I can run inside a pre-existing EKS 
cluster.
   
   Dockerfile:
   
   ```
   FROM python:3.13.14-trixie
   
   RUN pip install 
https://github.com/ursacomputing/crossbow/releases/download/actions-c6b0d8e743-github-wheel-manylinux-2-28-cp313-cp313-amd64/pyarrow-25.0.0.dev150-cp313-cp313-manylinux_2_28_x86_64.whl
   ```
   
   This image was built and pushed to an internal registry.
   
   I can launch a pod in the EKS cluster like so:
   
   ```bash
   kubectl run pyarrow-s3-test --rm -it --restart=Never \
     --image=XXXXXXX \
     
--overrides='{"spec":{"serviceAccountName":"YYYYYYY","nodeSelector":{"kubernetes.io/arch":"amd64"}}}'
 \
     --command -- python -c 'import pyarrow as pa; import pyarrow.fs as pafs; 
pa.show_versions(); print("", flush=True); fs = pafs.S3FileSystem(); 
print("Exiting...", flush=True)'
   ```
   
   (The node selector is just to force scheduling on an AMD64 instance because 
that's what the wheel was built for.)
   
   This pod will print the following, then hang forever until killed:
   
   ```
   pyarrow version info
   --------------------
   Package kind              : python-wheel-manylinux228
   Arrow C++ library version : 25.0.0-SNAPSHOT
   Arrow C++ compiler        : GNU 14.2.1
   Arrow C++ compiler flags  :  -Wno-noexcept-type -Wno-self-move 
-Wno-subobject-linkage  -fdiagnostics-color=always  -Wall 
-fno-semantic-interposition -msse4.2
   Arrow C++ git revision    :
   Arrow C++ git description :
   Arrow C++ build type      : release
   PyArrow build type        : release
   
   Exiting...
   ```
   
   # Traces
   
   Running `pstree` inside the stuck container shows me this:
   
   ```
   # pstree -pUta
   python,1 -c import pyarrow as pa; import pyarrow.fs as pafs; 
pa.show_versions(); fs = pafs.S3FileSystem(); print("Exiting...", flush=True)
     ├─(dpkg-preconfigu,51)
     ├─(dpkg-preconfigu,216)
     ├─{AwsEventLoop1},8
     └─{jemalloc_bg_thd},7
   ```
   
   And a GDB trace:
   
   ```
   # gdb -p 1 -batch \
       -ex "set pagination off" \
       -ex "thread apply all bt" \
       -ex "detach" -ex "quit"
   [New LWP 8]
   [New LWP 7]
   [Thread debugging using libthread_db enabled]
   Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
   __syscall_cancel_arch () at 
../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
   
   warning: 56     ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: No such 
file or directory
   Python Exception <class 'ModuleNotFoundError'>: No module named 'math'
   
   Thread 3 (Thread 0x7fc53a1ff6c0 (LWP 7) "jemalloc_bg_thd"):
   #0  __syscall_cancel_arch () at 
../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
   #1  0x00007fc540371668 in __internal_syscall_cancel (a1=<optimized out>, 
a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, 
a6=a6@entry=4294967295, nr=202) at ./nptl/cancellation.c:49
   #2  0x00007fc540371c8c in __futex_abstimed_wait_common64 (private=0, 
futex_word=0x7fc53a816628, expected=<optimized out>, op=<optimized out>, 
abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
   #3  __futex_abstimed_wait_common 
(futex_word=futex_word@entry=0x7fc53a816628, expected=<optimized out>, 
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, 
cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
   #4  0x00007fc540371ceb in __GI___futex_abstimed_wait_cancelable64 
(futex_word=futex_word@entry=0x7fc53a816628, expected=<optimized out>, 
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at 
./nptl/futex-internal.c:139
   #5  0x00007fc540374158 in __pthread_cond_wait_common (cond=0x7fc53a816608, 
mutex=0x7fc53a816678, clockid=0, abstime=0x0) at ./nptl/pthread_cond_wait.c:426
   #6  ___pthread_cond_wait (cond=0x7fc53a816608, mutex=0x7fc53a816678) at 
./nptl/pthread_cond_wait.c:458
   #7  0x00007fc53c440982 in background_thread_entry () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #8  0x00007fc540374b7b in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:448
   #9  0x00007fc5403f27f8 in __GI___clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
   
   Thread 2 (Thread 0x7fc533fff6c0 (LWP 8) "AwsEventLoop1"):
   #0  __syscall_cancel_arch () at 
../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
   #1  0x00007fc540371668 in __internal_syscall_cancel (a1=<optimized out>, 
a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, 
a6=a6@entry=0, nr=232) at ./nptl/cancellation.c:49
   #2  0x00007fc5403716ad in __syscall_cancel (a1=<optimized out>, 
a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, 
a6=a6@entry=0, nr=232) at ./nptl/cancellation.c:75
   #3  0x00007fc5403f2aad in epoll_wait (epfd=<optimized out>, 
events=<optimized out>, maxevents=<optimized out>, timeout=<optimized out>) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
   #4  0x00007fc53c35132a in aws_event_loop_thread () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #5  0x00007fc53c4198b9 in thread_fn () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #6  0x00007fc540374b7b in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:448
   #7  0x00007fc5403f27f8 in __GI___clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
   
   Thread 1 (Thread 0x7fc5401f1380 (LWP 1) "python"):
   #0  __syscall_cancel_arch () at 
../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
   #1  0x00007fc540371668 in __internal_syscall_cancel (a1=<optimized out>, 
a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, 
a6=a6@entry=4294967295, nr=202) at ./nptl/cancellation.c:49
   #2  0x00007fc540371c8c in __futex_abstimed_wait_common64 (private=0, 
futex_word=0x7fc53d75f740 <s_managed_thread_signal+32>, expected=<optimized 
out>, op=<optimized out>, abstime=0x0, cancel=true) at 
./nptl/futex-internal.c:57
   #3  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7fc53d75f740 
<s_managed_thread_signal+32>, expected=<optimized out>, 
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, 
cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
   #4  0x00007fc540371ceb in __GI___futex_abstimed_wait_cancelable64 
(futex_word=futex_word@entry=0x7fc53d75f740 <s_managed_thread_signal+32>, 
expected=<optimized out>, clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
private=private@entry=0) at ./nptl/futex-internal.c:139
   #5  0x00007fc540374158 in __pthread_cond_wait_common (cond=0x7fc53d75f720 
<s_managed_thread_signal>, mutex=0x7fc53d75f760 <s_managed_thread_lock>, 
clockid=0, abstime=0x0) at ./nptl/pthread_cond_wait.c:426
   #6  ___pthread_cond_wait (cond=0x7fc53d75f720 <s_managed_thread_signal>, 
mutex=0x7fc53d75f760 <s_managed_thread_lock>) at ./nptl/pthread_cond_wait.c:458
   #7  0x00007fc53c4171a9 in aws_condition_variable_wait () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #8  0x00007fc53c40eaa4 in aws_condition_variable_wait_pred () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #9  0x00007fc53c41bbdc in aws_thread_join_all_managed () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #10 0x00007fc53c2ddf05 in Aws::Crt::ApiHandle::~ApiHandle() () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #11 0x00007fc53c280dc3 in Aws::CleanupCrt() () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #12 0x00007fc53c27e6f5 in Aws::ShutdownAPI(Aws::SDKOptions const&) () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #13 0x00007fc53b86de88 in arrow::fs::EnsureS3Finalized() () from 
/usr/local/lib/python3.13/site-packages/pyarrow/libarrow.so.2500
   #14 0x00007fc538987980 in 
__pyx_pw_7pyarrow_5_s3fs_7ensure_s3_finalized(_object*, _object*) () from 
/usr/local/lib/python3.13/site-packages/pyarrow/_s3fs.cpython-313-x86_64-linux-gnu.so
   #15 0x00007fc540744db7 in atexit_callfuncs (state=state@entry=0x7fc5409adb70 
<_PyRuntime+99120>) at ./Modules/atexitmodule.c:144
   #16 0x00007fc540723e42 in _PyAtExit_Call (interp=0x7fc5409ab1a8 
<_PyRuntime+88424>) at ./Modules/atexitmodule.c:165
   #17 _Py_Finalize (runtime=<optimized out>) at Python/pylifecycle.c:2052
   #18 0x00007fc54073c693 in Py_RunMain () at Modules/main.c:778
   #19 0x00007fc5406f558c in Py_BytesMain (argc=<optimized out>, 
argv=<optimized out>) at Modules/main.c:830
   #20 0x00007fc54030bca8 in __libc_start_call_main 
(main=main@entry=0x5608b66ce140 <main>, argc=argc@entry=3, 
argv=argv@entry=0x7ffc0ef763d8) at ../sysdeps/nptl/libc_start_call_main.h:58
   #21 0x00007fc54030bd65 in __libc_start_main_impl (main=0x5608b66ce140 
<main>, argc=3, argv=0x7ffc0ef763d8, init=<optimized out>, fini=<optimized 
out>, rtld_fini=<optimized out>, stack_end=0x7ffc0ef763c8) at 
../csu/libc-start.c:360
   #22 0x00005608b66ce071 in _start ()
   [Inferior 1 (process 1) detached]
   ```
   
   (I think we can ignore the `ModuleNotFoundError`. I believe this is coming 
from GDB itself)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to