jorisvandenbossche opened a new issue, #38325:
URL: https://github.com/apache/arrow/issues/38325

   We added a new protocol exposing the C Data Interface (schema, array and 
stream) in Python through PyCapsule objects and new dunder methods 
`__arrow_c_schema/array/stream__` (https://github.com/apache/arrow/issues/35531 
/ https://github.com/apache/arrow/pull/37797).
   
   We recently also expanded the C Data Interface with device capabilities: 
https://arrow.apache.org/docs/dev/format/CDeviceDataInterface.html 
(https://github.com/apache/arrow/pull/34972).
   
   The currently merged PyCapsule protocol uses the stable non-device 
interface, but so the question is how to integrate the device version in the 
protocol in order to expose the C Device Data Interface in Python as well. Some 
options:
   
   1) Only support the device versions going forward (like currently only the 
cpu version is supported, i.e. the returned capsules always contain a device 
array/stream). 
     <sub>(this is a backwards incompatible change, but given we labeled the 
protocol as experimental, we can still make such changes if we think this is 
the best long-term option. The capsule names would reflect this change, thus 
this will generate a proper python error if a consumer or producer would not 
yet have been updated, and we can actually first deprecate the non-device 
support in pyarrow before removing it. All to say that AFAIU this is perfectly 
possible if we want it.)</sub>
   
   2) Add separate dunder methods `__arrow_c_device_array__` and 
`__arrow_c_device_stream__`, and then it is up to the producer to implement 
those dunders if they can (and we can strongly recommend doing that, also for 
CPU-only libraries), and to consumers to check which ones are present. 
   
   3) Allow the consumer to request a device array with some keyword (eg 
`__array_c_array__(device=True)`), which gives the consumer the option to 
request it while also still giving the producer the possibility to raise an 
error if they don't (yet) support the device version.
   
   4) Support both options in the current methods without keyword, i.e. allow 
`__arrow_c_array__` to return both a `"arrow_array"` or `"arrow_device_array"` 
capsule (and their capsule name distinguishes both). With the recommendation to 
always return a device version if you can, but allowing producers to still 
return a cpu version if they don't support the device one. This only gives some 
flexibility to the producer, and no control to the consumer to request the CPU 
version (so this essentially expects that all consumers will handle the device 
version)
   
   Options 2/3/4 are probably just variants of how to expose both interfaces, 
and thus the main initial question is whether we want to, long term, move 
towards an ecosystem where everyone uses the C Device Data Interface, or to 
keep using both interfaces side by side (as the main interchange mechanism, I 
mean, the device interface of course still embeds the standard struct). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to