[
https://issues.apache.org/jira/browse/ARROW-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou updated ARROW-15141:
-----------------------------------
Description:
Hi, I am getting randomly the following error when first running inference with
a Tensorflow model and then writing the result to a `.parquet` file:
{code}
Fatal error condition occurred in
/home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72:
aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn,
el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
################################################################################
Stack trace:
################################################################################
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59)
[0x7ffb14235f19]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48)
[0x7ffb14227098]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43)
[0x7ffb1406ea43]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
[0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a)
[0x7ffb1406c35a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
[0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a)
[0x7ffb142a2f5a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570)
[0x7ffb147fd570]
/lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
/home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51)
[0x562576609a51]
/bin/bash: line 1: 2341494 Aborted (core dumped)
{code}
My colleague ran into the same issue on Centos 8 while running the same job +
same environment on SLURM, so I guess it could be some issue with tensorflow +
pyarrow.
Also I found a github issue with multiple people running into the same issue:
[https://github.com/huggingface/datasets/issues/3310]
It would be very important to my lab that this bug gets resolved, as we cannot
work with parquet any more. Unfortunately, we do not have the knowledge to fix
it.
was:
Hi, I am getting randomly the following error when first running inference with
a Tensorflow model and then writing the result to a `.parquet` file:
```
Fatal error condition occurred in
/home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72:
aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn,
el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
################################################################################
Stack trace:
################################################################################
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59)
[0x7ffb14235f19]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48)
[0x7ffb14227098]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43)
[0x7ffb1406ea43]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
[0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a)
[0x7ffb1406c35a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
[0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a)
[0x7ffb142a2f5a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570)
[0x7ffb147fd570]
/lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
/home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51)
[0x562576609a51]
/bin/bash: line 1: 2341494 Aborted (core dumped)
```
My colleague ran into the same issue on Centos 8 while running the same job +
same environment on SLURM, so I guess it could be some issue with tensorflow +
pyarrow.
Also I found a github issue with multiple people running into the same issue:
[https://github.com/huggingface/datasets/issues/3310]
It would be very important to my lab that this bug gets resolved, as we cannot
work with parquet any more. Unfortunately, we do not have the knowledge to fix
it.
> [C++] Fatal error condition occurred in aws_thread_launch
> ---------------------------------------------------------
>
> Key: ARROW-15141
> URL: https://issues.apache.org/jira/browse/ARROW-15141
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 6.0.0, 6.0.1
> Environment: - `uname -a`:
> Linux datalab2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021
> x86_64 x86_64 x86_64 GNU/Linux
> - `mamba list | grep -i "pyarrow\|tensorflow\|^python"`
> pyarrow 6.0.0 py39hff6fa39_1_cpu conda-forge
> python 3.9.7 hb7a2778_3_cpython conda-forge
> python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
> python-flatbuffers 1.12 pyhd8ed1ab_1 conda-forge
> python-irodsclient 1.0.0 pyhd8ed1ab_0 conda-forge
> python-rocksdb 0.7.0 py39h7fcd5f3_4 conda-forge
> python_abi 3.9 2_cp39 conda-forge
> tensorflow 2.6.2 cuda112py39h9333c2f_0 conda-forge
> tensorflow-base 2.6.2 cuda112py39h7de589b_0 conda-forge
> tensorflow-estimator 2.6.2 cuda112py39h9333c2f_0 conda-forge
> tensorflow-gpu 2.6.2 cuda112py39h0bbbad9_0 conda-forge
> Reporter: F. H.
> Priority: Major
>
> Hi, I am getting randomly the following error when first running inference
> with a Tensorflow model and then writing the result to a `.parquet` file:
> {code}
> Fatal error condition occurred in
> /home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72:
> aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn,
> el_group, &thread_options) == AWS_OP_SUCCESS
> Exiting Application
> ################################################################################
> Stack trace:
> ################################################################################
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59)
> [0x7ffb14235f19]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48)
> [0x7ffb14227098]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43)
> [0x7ffb1406ea43]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
> [0x7ffb14237fad]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a)
> [0x7ffb1406c35a]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
> [0x7ffb14237fad]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a)
> [0x7ffb142a2f5a]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570)
> [0x7ffb147fd570]
> /lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
> /lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
> /home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51)
> [0x562576609a51]
> /bin/bash: line 1: 2341494 Aborted (core dumped)
> {code}
> My colleague ran into the same issue on Centos 8 while running the same job +
> same environment on SLURM, so I guess it could be some issue with tensorflow
> + pyarrow.
> Also I found a github issue with multiple people running into the same issue:
> [https://github.com/huggingface/datasets/issues/3310]
>
> It would be very important to my lab that this bug gets resolved, as we
> cannot work with parquet any more. Unfortunately, we do not have the
> knowledge to fix it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)