lupko opened a new issue, #1523: URL: https://github.com/apache/arrow-adbc/issues/1523
Hello, I'm trying to integrate ADBC into a Flight RPC service. Typical use case is that a call (GetFlightInfo) performs a query and then client comes to pick stream of results via DoGet. So on GetFlightInfo service creates new cursor, executes and sends appropriate FlightInfo so that DoGet with the right ticket will pick up the data. For this the code does `cursor.fetch_record_batch()`. The resulting RecordBatchReader is then wrapped into `pyarrow.flight.RecordBatchStream` and returned. Doing this crashes the entire server with SIGSEGV. You can find the reproducer in this gist: https://gist.github.com/lupko/8b6f165a6574ef830c531c8056b20957. The reproducer skips the GetFlightInfo for sakes of brevity. Poking around the code of ADBC python wrappers, I _think_ this crash happens because `AdbcRecordBatchReader` is not ready for interop with PyArrow . The PyArrow's `RecordBatchReader` (`pyarrow.lib.RecordBatchReader`) has `reader` field that contains the actual C++ RecordBatchReader. PyArrow code usually grabs the actual `reader` as soon as possible and uses it for the different purposes (like getting batches to send out via Flight RPC). The `AdbcRecordBatchReader` does not extend `pyarrow.lib.RecordBatchReader` and so `reader` is just not there and everything comes crashing down. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
