[ 
https://issues.apache.org/jira/browse/ARROW-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-15141:
-----------------------------------
    Description: 
Hi, I am getting randomly the following error when first running inference with 
a Tensorflow model and then writing the result to a `.parquet` file:

{code}
Fatal error condition occurred in 
/home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72:
 aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, 
el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
################################################################################
Stack trace:
################################################################################
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59)
 [0x7ffb14235f19]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48)
 [0x7ffb14227098]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43)
 [0x7ffb1406ea43]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
 [0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a)
 [0x7ffb1406c35a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
 [0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a)
 [0x7ffb142a2f5a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570)
 [0x7ffb147fd570]
/lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
/home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51) 
[0x562576609a51]
/bin/bash: line 1: 2341494 Aborted                 (core dumped)
{code}


My colleague ran into the same issue on Centos 8 while running the same job + 
same environment on SLURM, so I guess it could be some issue with tensorflow + 
pyarrow.

Also I found a github issue with multiple people running into the same issue:
[https://github.com/huggingface/datasets/issues/3310]

 

It would be very important to my lab that this bug gets resolved, as we cannot 
work with parquet any more. Unfortunately, we do not have the knowledge to fix 
it.

  was:
Hi, I am getting randomly the following error when first running inference with 
a Tensorflow model and then writing the result to a `.parquet` file:
```

Fatal error condition occurred in 
/home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72:
 aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, 
el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
################################################################################
Stack trace:
################################################################################
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59)
 [0x7ffb14235f19]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48)
 [0x7ffb14227098]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43)
 [0x7ffb1406ea43]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
 [0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a)
 [0x7ffb1406c35a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
 [0x7ffb14237fad]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a)
 [0x7ffb142a2f5a]
/home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570)
 [0x7ffb147fd570]
/lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
/home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51) 
[0x562576609a51]
/bin/bash: line 1: 2341494 Aborted                 (core dumped)

```

 

My colleague ran into the same issue on Centos 8 while running the same job + 
same environment on SLURM, so I guess it could be some issue with tensorflow + 
pyarrow.

Also I found a github issue with multiple people running into the same issue:
[https://github.com/huggingface/datasets/issues/3310]

 

It would be very important to my lab that this bug gets resolved, as we cannot 
work with parquet any more. Unfortunately, we do not have the knowledge to fix 
it.


> [C++] Fatal error condition occurred in aws_thread_launch
> ---------------------------------------------------------
>
>                 Key: ARROW-15141
>                 URL: https://issues.apache.org/jira/browse/ARROW-15141
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 6.0.0, 6.0.1
>         Environment: - `uname -a`:
> Linux datalab2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 
> x86_64 x86_64 x86_64 GNU/Linux
> - `mamba list | grep -i "pyarrow\|tensorflow\|^python"`
> pyarrow                   6.0.0           py39hff6fa39_1_cpu    conda-forge
> python                    3.9.7           hb7a2778_3_cpython    conda-forge
> python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
> python-flatbuffers        1.12               pyhd8ed1ab_1    conda-forge
> python-irodsclient        1.0.0              pyhd8ed1ab_0    conda-forge
> python-rocksdb            0.7.0            py39h7fcd5f3_4    conda-forge
> python_abi                3.9                      2_cp39    conda-forge
> tensorflow                2.6.2           cuda112py39h9333c2f_0    conda-forge
> tensorflow-base           2.6.2           cuda112py39h7de589b_0    conda-forge
> tensorflow-estimator      2.6.2           cuda112py39h9333c2f_0    conda-forge
> tensorflow-gpu            2.6.2           cuda112py39h0bbbad9_0    conda-forge
>            Reporter: F. H.
>            Priority: Major
>
> Hi, I am getting randomly the following error when first running inference 
> with a Tensorflow model and then writing the result to a `.parquet` file:
> {code}
> Fatal error condition occurred in 
> /home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72:
>  aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, 
> el_group, &thread_options) == AWS_OP_SUCCESS
> Exiting Application
> ################################################################################
> Stack trace:
> ################################################################################
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59)
>  [0x7ffb14235f19]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48)
>  [0x7ffb14227098]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43)
>  [0x7ffb1406ea43]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
>  [0x7ffb14237fad]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a)
>  [0x7ffb1406c35a]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
>  [0x7ffb14237fad]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a)
>  [0x7ffb142a2f5a]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570)
>  [0x7ffb147fd570]
> /lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
> /lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
> /home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51) 
> [0x562576609a51]
> /bin/bash: line 1: 2341494 Aborted                 (core dumped)
> {code}
> My colleague ran into the same issue on Centos 8 while running the same job + 
> same environment on SLURM, so I guess it could be some issue with tensorflow 
> + pyarrow.
> Also I found a github issue with multiple people running into the same issue:
> [https://github.com/huggingface/datasets/issues/3310]
>  
> It would be very important to my lab that this bug gets resolved, as we 
> cannot work with parquet any more. Unfortunately, we do not have the 
> knowledge to fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to