TomTJarosz opened a new issue, #39313:
URL: https://github.com/apache/arrow/issues/39313

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Concurrent invocation of `_pandas_api#is_data_frame` can result in incorrect 
behavior (returning false when provided a dataframe). This can cause upstream 
issues when using higher-level public arrow APIs (such as `write_feather`).
   
   I have authored a pytest attached below which reproduces the issue:
   ```py
   import pandas as pd
   from pyarrow.pandas_compat import _pandas_api
   from threading import Thread
   
   def test_is_data_frame_race_condition():
       wait = True
       num_threads = 10
       df = pd.DataFrame()
       results = []
       def rc():
           while wait:
               pass
           results.append(_pandas_api.is_data_frame(df))
   
       threads = [Thread(target=rc) for _ in range(num_threads)]
       for t in threads:
           t.start()
   
       wait = False
   
       for t in threads:
           t.join()
   
       assert len(results) == num_threads
       assert all(results), "is_data_frame() returned false when given a 
dataframe"
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to