pitrou commented on issue #38676:
URL: https://github.com/apache/arrow/issues/38676#issuecomment-1808712708

   Second problem: the hang. This is a bit tricky and only occurs if the CSV 
reader returns an error _and_ the file object is a Python file object (for 
example `BytesIO`).
   
   Everything happens in this snippet:
   
https://github.com/apache/arrow/blob/1ff43ab5ee13de5c3130acf10c7aa8eb9680baab/python/pyarrow/_csv.pyx#L1266-L1271
   
   1. `reader.get().Read()` returns prematurely because of the error; it hasn't 
finished reading the file yet
   2. the `nogil` block exits
   3. the `read_csv` function exits raising a Python exception
   4. at this point, the `reader` cdef variable is destroyed (a C++ 
`shared_ptr[CCSVReader]` instance)
   5. the C++ `CSVReader` destructor waits for all the threaded read tasks to 
end
   6. a read task, running on another thread, is still trying to read another 
piece of data from the file object; it is waiting to take the GIL before 
calling `BytesIO.read`
   
   Notice that: at point 5, the C++ `CSVReader` destructor waits for another 
thread task to end. At point 6, the thread task is waiting for the GIL. **But** 
at point 5, the GIL is still being held. 
   
   We end up with a deadlock around the GIL, because of a C++ destructor that 
runs without releasing the GIL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to