pitrou commented on issue #38676: URL: https://github.com/apache/arrow/issues/38676#issuecomment-1808712708
Second problem: the hang. This is a bit tricky and only occurs if the CSV reader returns an error _and_ the file object is a Python file object (for example `BytesIO`). Everything happens in this snippet: https://github.com/apache/arrow/blob/1ff43ab5ee13de5c3130acf10c7aa8eb9680baab/python/pyarrow/_csv.pyx#L1266-L1271 1. `reader.get().Read()` returns prematurely because of the error; it hasn't finished reading the file yet 2. the `nogil` block exits 3. the `read_csv` function exits raising a Python exception 4. at this point, the `reader` cdef variable is destroyed (a C++ `shared_ptr[CCSVReader]` instance) 5. the C++ `CSVReader` destructor waits for all the threaded read tasks to end 6. a read task, running on another thread, is still trying to read another piece of data from the file object; it is waiting to take the GIL before calling `BytesIO.read` Notice that: at point 5, the C++ `CSVReader` destructor waits for another thread task to end. At point 6, the thread task is waiting for the GIL. **But** at point 5, the GIL is still being held. We end up with a deadlock around the GIL, because of a C++ destructor that runs without releasing the GIL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
