lupko opened a new issue, #40004:
URL: https://github.com/apache/arrow/issues/40004
### Describe the bug, including details regarding any error messages,
version, and platform.
Hello,
I'm on Arrow 15.0, Python 3.11. I have a Flight RPC service that does
roughly the following:
- Declares and registers a custom UDF compute function (this combines calls
to couple of existing compute functions)
- During GetFlightInfo, code prepares and starts Acero:
- Declaration uses source + project node
- The project node is configured so that the custom UDF compute is
unleashed on some of the columns
- A RecordBatchReader is created using Declaration.to_reader()
- During DoGet, service returns `flight.GeneratorStream()` with the reader
created using Declaration.to_reader()
- Code uses `GeneratorStream` instead of `flight.RecordBatchStream`
because there is likelyhood that `GetFlightInfo` could produce multiple pieces
to concatenate into DoGet stream.
This service is then called from Java land which quite reliably triggers
full interpreter deadlock of the server process.
The thread dump of the stuck process:
https://gist.github.com/lupko/c4491df7a36247b48ba0248c2d5f9ae6
I have been playing around and when I got the `GeneratorStream` out of the
picture (e.g. used RecordBatchStream instead), there were no deadlocks.
Following the traces from thread dump, I believe the problem is here:
https://github.com/apache/arrow/blob/de3cdc00c21fd3e9d8d763099591f23720ca8d1f/python/pyarrow/_flight.pyx#L2016
So in the end I think the situation is: The call does not release GIL before
it goes to get `Next()`. But it will never get `Next()` because to do so Acero
needs to run Python UDF which needs GIL as well.
I did try fix on a local build - wrapping the call in `with nogil` - and all
is good. I will create PR shortly.
### Component(s)
FlightRPC, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]