[GitHub] [arrow-adbc] jorisvandenbossche commented on pull request #702: feat(python/adbc_driver_manager): experiment with using PyCapsules

via GitHub Wed, 24 May 2023 06:29:43 -0700


jorisvandenbossche commented on PR #702:
URL: https://github.com/apache/arrow-adbc/pull/702#issuecomment-1561157777


   Small showcase:
   
   ```python
   import adbc_driver_sqlite.dbapi
   
   conn = adbc_driver_sqlite.dbapi.connect()
   cursor = conn.cursor()
   
   # using those private methods for now, to get a handle object
   # (instead of already a pyarrow object)
   cursor._prepare_execute("SELECT 1 as a, 2.0 as b, 'Hello, world!' as c")
   handle, _ = cursor._stmt.execute_query()
   
   # manually getting the capsule and passing it to pyarrow for now
   capsule = handle._to_capsule()
   pa.RecordBatchReader._import_from_c_capsule(capsule).read_all()
   # pyarrow.Table
   # a: int64
   # b: double
   # c: string
   # ----
   # a: [[1]]
   # b: [[2]]
   # c: [["Hello, world!"]]
   
   # trying to import it a second time raises an error
   pa.RecordBatchReader._import_from_c_capsule(capsule).read_all()
   # ...
   # ArrowInvalid: Cannot import released ArrowArrayStream
   
   # when the capsule object gets deleted/collected -> release callback is not 
called
   # because it was already consumed
   del capsule
   
   # but when the stream was not consumed, the capsule deleter will call the 
release callback
   cursor._prepare_execute("SELECT 1 as a, 2.0 as b, 'Hello, world!' as c")
   handle, _ = cursor._stmt.execute_query()
   capsule = handle._to_capsule()
   del capsule
   # calling the release
   ```
   
   Some design questions about this for the adbc manager side:
   
   - Currently the DBAPI methods like `execute(..)` already initializes the 
pyarrow RecordBatchReader. We might want to delay that creation until actually 
`fetch_arrow_table()` gets called? (or one of the other fetch variants that and 
up consuming the RecordBatchReader as well) 
     And then we could for example have a `fetch_arrow_stream()` method that 
gives you some custom object that then has the appropriate protocol method like 
`__arrow_c_stream__` (instead of the current dummy `_to_capsule()`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-adbc] jorisvandenbossche commented on pull request #702: feat(python/adbc_driver_manager): experiment with using PyCapsules

Reply via email to