jorisvandenbossche commented on issue #35531: URL: https://github.com/apache/arrow/issues/35531#issuecomment-1561063850
> I don't know it if it was mentioned in the discussion, but I think it's fairly important that the PyCapsule have a finalizer that calls the `release()` callback (if non-null), and to have that documented. Yes, that's certainly the idea, and discussed in https://github.com/apache/arrow/issues/34031 > Most ways that I know about to create an ArrowArray (`pa.array()`, `pa.record_batch()`, `arrow::as_arrow_array()`, etc.) also accept a `type` or `schema`. .. That's a good question. Other protocols like `__array__` or `__arrow_array__` (which are library specificy, numpy and pyarrow, respectively), also support this. But more similar protocols like `__array_interface__` or `__dlpack__` (which both also share pointers to buffers, library-independent) don't do this. I think one aspect here is that this second set of methods assume you "just" give access to the actual, underlying data, and not do any conversion (and so are always zero-copy), or otherwise would raise an error if the type is not supported through the protocol (and so there is never a question of what the "correct" type would be for the C interface). In the DataFrame world, there is more variation in data types, though, and so this might be less straightforward. You will much more easily end up in a situation where the DataFrame object has a column of data that is not exactly / natively in arrow memory, and in those cases there might indeed be some ambiguity in which type should be used. Or, whether such conversion should be supported or rather an error should be raised. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
