[GitHub] [arrow] jorisvandenbossche commented on issue #35531: [Python] Add Python protocol for the Arrow C (Data/Stream) Interface

via GitHub Wed, 24 May 2023 05:40:48 -0700


jorisvandenbossche commented on issue #35531:
URL: https://github.com/apache/arrow/issues/35531#issuecomment-1561063850


   > I don't know it if it was mentioned in the discussion, but I think it's 
fairly important that the PyCapsule have a finalizer that calls the `release()` 
callback (if non-null), and to have that documented.
   
   Yes, that's certainly the idea, and discussed in 
https://github.com/apache/arrow/issues/34031
   
   > Most ways that I know about to create an ArrowArray (`pa.array()`, 
`pa.record_batch()`, `arrow::as_arrow_array()`, etc.) also accept a `type` or 
`schema`. ..
   
   That's a good question. Other protocols like `__array__` or 
`__arrow_array__` (which are library specificy, numpy and pyarrow, 
respectively), also support this. But more similar protocols like 
`__array_interface__` or `__dlpack__` (which both also share pointers to 
buffers, library-independent) don't do this. 
   
   I think one aspect here is that this second set of methods assume you "just" 
give access to the actual, underlying data, and not do any conversion (and so 
are always zero-copy), or otherwise would raise an error if the type is not 
supported through the protocol (and so there is never a question of what the 
"correct" type would be for the C interface). 
   In the DataFrame world, there is more variation in data types, though, and 
so this might be less straightforward. You will much more easily end up in a 
situation where the DataFrame object has a column of data that is not exactly / 
natively in arrow memory, and in those cases there might indeed be some 
ambiguity in which type should be used. Or, whether such conversion should be 
supported or rather an error should be raised. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on issue #35531: [Python] Add Python protocol for the Arrow C (Data/Stream) Interface

Reply via email to