rishav394 opened a new pull request, #4366:
URL: https://github.com/apache/arrow-adbc/pull/4366

   ## What
   
   Use-after-free in `_reader.pyx` causes SIGSEGV when 
`pyarrow.RecordBatchReader._import_from_c` rejects the stream's schema (e.g. 
unsupported format string like Decimal32/64 on PyArrow < 15).
   
   ## Root cause
   
   `_import_from_c` shallow-copies the `ArrowArrayStream`, passes the original 
to PyArrow. On failure, PyArrow calls `release()` on the stream (per Arrow C 
Data Interface spec), setting `release = NULL`. Then `check_error(e)` 
dereferences the now-freed stream through the shallow copy, triggering a 
segfault.
   
   ## Fix
   
   After `_import_from_c` raises, check if `c_stream.release == NULL`. If so, 
PyArrow already released the stream - re-raise the original exception directly 
instead of calling `check_error` on dangling memory.
   
   ## Reproduction
   
   Minimal: any ADBC driver returning a schema with a format string unsupported 
by the consumer's PyArrow version (e.g. Decimal32/64 from arrow-go based 
drivers consumed by PyArrow < 15).
   
   ```python
   # Trino + adbc-driver-flightsql or adbc-driver-trino
   # SELECT CAST(10.1 AS DECIMAL(10,4))
   # -> segfault on PyArrow 11-14
   ```
   
   ## Test
   
   Added `test_import_invalid_format_raises` - poisons an exported stream's 
child format with an invalid string, verifies `ArrowInvalid` is raised (not a 
crash).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to