[
https://issues.apache.org/jira/browse/ARROW-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242491#comment-17242491
]
Micah Kornfield commented on ARROW-10784:
-----------------------------------------
I think the bug is actually in python/cython someplace. The current theory
is that with two threads running:
1. Thread 1 starts loading pyarrow.compute (which will start executing
make_global_functions_
(https://github.com/apache/arrow/blob/master/python/pyarrow/compute.py#L218)).
The GIL is yielded (it looks like there is a "with nogil:" inside get_function
call).
2. Thread 2 imports pyarrow.compute (no initialization happens here). Thread
2 then calls. pyarrow.compute.flatten which hasn't been installed in globals
yet because thread 1 hasn't finished running _make___global__functions_. This
raises an error.
I'm trying to make a minimal repro test case but the codepath in question is
from lazy loading compute within an Array (i.e. a call to pc_() defined in
lib.pyx)
> [Python] Loading pyarrow.compute isn't thread safe
> --------------------------------------------------
>
> Key: ARROW-10784
> URL: https://issues.apache.org/jira/browse/ARROW-10784
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 2.0.0
> Reporter: Micah Kornfield
> Priority: Major
> Fix For: 3.0.0
>
>
> When using Arrow in a multithreaded environment it is possible to trigger an
> initialization race on the pyarrow.compute module when calling Array.flatten.
>
> Flatten calls _pc() which imports pyarrow compute but if two threads call
> flatten at the same time is possible that the global initialization of
> functions from the registry will be incomplete and therefore cause an
> AttributeError.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)