jorisvandenbossche commented on issue #38325: URL: https://github.com/apache/arrow/issues/38325#issuecomment-2029775202
Practical experimentation will help in informing the decisions we have to make here regarding the control of cross-device copies (e.g. would a `requested_device` keyword be useful?). Therefore I would like to suggest that we start with a minimal addition (just the new methods as currently described in https://github.com/apache/arrow/pull/40708, without further keywords), and get the implementation for pyarrow merged for 16.0. The guidelines / recommendations section can later be updated while we get experience with the first implementations. Based on the above discussion, I would add the following to the PR? - The `device` protocol methods should return data as-is on the device it is currently on (i.e. the expectation is that there is no cross-device copy happening in this method) (sidenote: of course in case someone would implement a tabular object that could use different devices for different columns, this guarantee of "no device copy" cannot be made, given that the resulting structure's data should live on a single device. But that seems a corner case not worth mentioning in (complicating) the spec?) - A device-aware producer _can_ implement `__arrow_c_array/stream__` that does an implicit device to CPU copy when called. (this means that a consumer supporting multiple devices (like pyarrow) should always first check the device protocol methods before the CPU-only versions. And checking my PR implementing this for pyarrow (https://github.com/apache/arrow/pull/40717), I see I need to update for that) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
