[ 
https://issues.apache.org/jira/browse/ARROW-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242491#comment-17242491
 ] 

Micah Kornfield commented on ARROW-10784:
-----------------------------------------

I think the bug is actually in python/cython someplace.    The current theory 
is that with two threads running:

 

1.  Thread  1 starts loading pyarrow.compute (which will start executing 
make_global_functions_ 
(https://github.com/apache/arrow/blob/master/python/pyarrow/compute.py#L218)).  
The GIL is yielded (it looks like there is a "with nogil:" inside get_function 
call).

2.  Thread 2 imports pyarrow.compute (no initialization happens here).  Thread 
2 then calls. pyarrow.compute.flatten which hasn't been installed in globals 
yet because thread 1 hasn't finished running _make___global__functions_.  This 
raises an error.

 

I'm trying to make a minimal repro test case but the codepath in question is 
from lazy loading compute within an Array (i.e. a call to pc_() defined in 
lib.pyx)

 

 

 

 

> [Python] Loading pyarrow.compute isn't thread safe
> --------------------------------------------------
>
>                 Key: ARROW-10784
>                 URL: https://issues.apache.org/jira/browse/ARROW-10784
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 2.0.0
>            Reporter: Micah Kornfield
>            Priority: Major
>             Fix For: 3.0.0
>
>
> When using Arrow in a multithreaded environment it is possible to trigger an 
> initialization race on the pyarrow.compute module when calling Array.flatten.
>  
> Flatten calls _pc() which imports pyarrow compute but if two threads call 
> flatten at the same time is possible that the global initialization of 
> functions from the registry will be incomplete and therefore cause an 
> AttributeError.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to