F. H. created ARROW-15141:
-----------------------------
Summary: Fatal error condition occurred in aws_thread_launch
Key: ARROW-15141
URL: https://issues.apache.org/jira/browse/ARROW-15141
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 6.0.1, 6.0.0
Environment: - `uname -a`:
Linux datalab2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021
x86_64 x86_64 x86_64 GNU/Linux
- `mamba list | grep -i "pyarrow\|tensorflow\|^python"`
pyarrow 6.0.0 py39hff6fa39_1_cpu conda-forge
python 3.9.7 hb7a2778_3_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-flatbuffers 1.12 pyhd8ed1ab_1 conda-forge
python-irodsclient 1.0.0 pyhd8ed1ab_0 conda-forge
python-rocksdb 0.7.0 py39h7fcd5f3_4 conda-forge
python_abi 3.9 2_cp39 conda-forge
tensorflow 2.6.2 cuda112py39h9333c2f_0 conda-forge
tensorflow-base 2.6.2 cuda112py39h7de589b_0 conda-forge
tensorflow-estimator 2.6.2 cuda112py39h9333c2f_0 conda-forge
tensorflow-gpu 2.6.2 cuda112py39h0bbbad9_0 conda-forge
Reporter: F. H.
Hi, I am getting randomly the following error when first running inference with
a Tensorflow model and then writing the result to a `.parquet` file:
```
Fatal error condition occurred in
/home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72:
aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn,
el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
################################################################################
Stack trace:
################################################################################
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59)
[0x7ffb14235f19]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48)
[0x7ffb14227098]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43)
[0x7ffb1406ea43]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
[0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a)
[0x7ffb1406c35a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
[0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a)
[0x7ffb142a2f5a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570)
[0x7ffb147fd570]
/lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
/home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51)
[0x562576609a51]
/bin/bash: line 1: 2341494 Aborted (core dumped)
```
My colleague ran into the same issue on Centos 8 while running the same job +
same environment on SLURM, so I guess it could be some issue with tensorflow +
pyarrow.
Also I found a github issue with multiple people running into the same issue:
[https://github.com/huggingface/datasets/issues/3310]
It would be very important to my lab that this bug gets resolved, as we cannot
work with parquet any more. Unfortunately, we do not have the knowledge to fix
it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)