Mike Gevaert created ARROW-17319:
------------------------------------

             Summary: pyarrow seems to set default CPU affinity to 0 on 
shutdown, crashes if CPU 0 is not available
                 Key: ARROW-17319
                 URL: https://issues.apache.org/jira/browse/ARROW-17319
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 9.0.0
         Environment: Ubuntu 20.02 / Python 3.8.10 (default, Jun 22 2022, 
20:18:18)

$ pip list 
Package         Version
--------------- -------
numpy           1.23.1 
pandas          1.4.3  
pip             20.0.2 
pkg-resources   0.0.0  
pyarrow         9.0.0  
python-dateutil 2.8.2  
pytz            2022.1 
setuptools      44.0.0 
six             1.16.0 
            Reporter: Mike Gevaert


I get the following traceback when exiting python after loading 
{{pyarrow.parquet}}

{code}
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> os.getpid()
25106
>>> import pyarrow.parquet
>>> 
Fatal error condition occurred in 
/opt/vcpkg/buildtrees/aws-c-io/src/9e6648842a-364b708815.clean/source/event_loop.c:72:
 aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, 
el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
################################################################################
Stack trace:
################################################################################
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200af06) 
[0x7f831b2b3f06]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x20028e5) 
[0x7f831b2ab8e5]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f27e09) 
[0x7f831b1d0e09]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) 
[0x7f831b2b4a3d]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f25948) 
[0x7f831b1ce948]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) 
[0x7f831b2b4a3d]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1ee0b46) 
[0x7f831b189b46]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x194546a) 
[0x7f831abee46a]
/lib/x86_64-linux-gnu/libc.so.6(+0x468a7) [0x7f831c6188a7]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7f831c618a60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7f831c5f608a]
 {code}

To replicate this; one needs to make sure that CPU 0 isn't available to 
schedule tasks on.  In HPC our environment, that happens due to slurm using 
cgroups to constrain CPU usage.

On a linux workstation, one should be able to:
1) open python as a normal user
2) get the pid
3) as root:
{code}
cd /sys/fs/cgroup/cpuset/
mkdir pyarrow
cd pyarrow
echo 0 > cpuset.mems
echo 1 > cpuset.cpus # sets the cgroup to only have access to cpu 1
echo $PID > tasks
{code}
Then, in the python enviroment:
{code}
import pyarrow.parquet
exit()
{code}
Which should trigger the crash.

Sadly, I couldn't track down which {{aws-c-common}} and {{aws-c-io}} are being 
used for the 9.0.0 py38 manylinux wheels. (libarrow.so.900 has 
BuildID[sha1]=dd6c5a2efd5cacf09657780a58c40f7c930e4df1)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to