[GitHub] [arrow] paleolimbot commented on issue #35531: [Python] Add Python protocol for the Arrow C (Data/Stream) Interface

via GitHub Wed, 17 May 2023 06:35:27 -0700


paleolimbot commented on issue #35531:
URL: https://github.com/apache/arrow/issues/35531#issuecomment-1551412155


   Just a note that I think `__arrow_c_array__` and `__arrow_c_schema__` are 
rather essential (I'd build nanoarrow's Python support on top of them). I think 
it's fairly uncontroversial that their behaviour should align with 
`__arrow_c_array_stream__`. A concrete example of somewhere that might 
implement `__arrow_c_schema__` is a GeoArrow type representation...currently 
they're stored as something more like an integer type ID because it's faster. 
Substrait types could also implement it or maybe pandas dtypes. It would be 
rather useful if numpy/pandas.Series implemented `__arrow_c_array__`, no?
   
   I don't know it if it was mentioned in the discussion, but I think it's 
fairly important that the PyCapsule have a finalizer that calls the `release()` 
callback (if non-null), and to have that documented. I assume that's the point 
of using the PyCapsule but I haven't discussed that with anybody except maybe 
in passing with Joris.
   
   > Do we want to distinguish between an array and a tabular version?
   
   Most ways that I know about to create an ArrowArray (`pa.array()`, 
`pa.record_batch()`, `arrow::as_arrow_array()`, etc.) also accept a `type` or 
`schema`. Above the level of "array or table", there are certainly objects 
whose "one true Arrow type" is ambiguous. You could do `__arrow_c_array__(self, 
schema=None)` and `__arrow_c_array_stream__(self, schema=None)`. That gets a 
little hard because then either the producer or the consumer has to do some 
sort of equality check or validation.
   
   Did you envision that `__arrow_c_stream__()` could return things that are 
not tables? They certainly can and do outside pyarrow (I beleive Rust2 supports 
it...nanoarrow in R does too). It's a fairly useful representation of a 
ChunkedArray since there's no other officially ABIified way to do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] paleolimbot commented on issue #35531: [Python] Add Python protocol for the Arrow C (Data/Stream) Interface

Reply via email to