TomTJarosz opened a new issue, #39313:
URL: https://github.com/apache/arrow/issues/39313
### Describe the bug, including details regarding any error messages,
version, and platform.
Concurrent invocation of `_pandas_api#is_data_frame` can result in incorrect
behavior (returning false when provided a dataframe). This can cause upstream
issues when using higher-level public arrow APIs (such as `write_feather`).
I have authored a pytest attached below which reproduces the issue:
```py
import pandas as pd
from pyarrow.pandas_compat import _pandas_api
from threading import Thread
def test_is_data_frame_race_condition():
wait = True
num_threads = 10
df = pd.DataFrame()
results = []
def rc():
while wait:
pass
results.append(_pandas_api.is_data_frame(df))
threads = [Thread(target=rc) for _ in range(num_threads)]
for t in threads:
t.start()
wait = False
for t in threads:
t.join()
assert len(results) == num_threads
assert all(results), "is_data_frame() returned false when given a
dataframe"
```
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]