pitrou commented on issue #38325: URL: https://github.com/apache/arrow/issues/38325#issuecomment-2012549275
> * For a CPU-only library, it is encouraged to implement both the standard and device version of the protocol methods (i.e. both `__arrow_c_array__` and `__arrow_c_device_array__`, and/or both `__arrow_c_stream__` and `__arrow_c_device_stream__`) +0. I'm not sure it makes sense to ask producers for more effort in this regard. > * The presence of only the standard version (e.g. only `__arrow_c_array__` and not `__arrow_c_device_array__`) means that this is a CPU-only data object. +1 > * For a device-aware library, and for data structures that can only reside in non-CPU memory, you should _only_ implement the device version of the protocol (e.g. only add `__arrow_c_device_array__`, and never add a `__arrow_c_array__`) +1 > * Libraries can of course have data structures that can live on both CPU or non-CPU, and for those it is fine that they implement both versions (and error in the non-device version if the data is not on the CPU)? +1 > EDIT: this _has_ to be fine of course, given that pyarrow is in this situation, and we want to define both methods. But should we error in `__arrow_c_array__` for non-CPU data? (right now we don't actually check the device here, but silently return an ArrowArray struct with null buffer pointers) Yes, we should. The expectation of the (regular) C Data Interface is that data lives on the CPU. > * Do we want to say something about expectations that no cross-device copies happen? In the producer or in the consumer? IMHO the consumer is free to do whatever suits them. On the producer side the question is a bit more delicate. Perhaps we need to pass some options to `__arrow_c_device_array__`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
