Re: [I] [Python] Expose the device interface through the Arrow PyCapsule protocol [arrow]

via GitHub Tue, 26 Mar 2024 14:53:27 -0700


jorisvandenbossche commented on issue #38325:
URL: https://github.com/apache/arrow/issues/38325#issuecomment-2021536856


   > Assuming we make copies implicit via `__arrow_c_array__`, this makes the 
only copy we'll support handling is device --> CPU for now. What would 
addressing along more generic copy requests later look like?
   
   I assume we could have something like 
`obj.__arrow_device_array__(requested_device=kCPU)`, and then it is up to the 
producer to see if they can provide the data on that device, and if not error 
or return on native device (depending on whether the requested device should be 
followed strictly. For `requested_schema` we decided this was only best effort).
   
   > So perhaps we're overthinking this, and producers of non-CPU data should 
simply implement the C Data Interface protocol with implicit cross-device 
copies.
   
   That means that passing such a non-CPU object, like a cudf DataFrame, to an 
interface that can consume data through this protocol (eg pandas or polars 
constructors, duckdb query with implicit variable, ...) would automatically do 
a potentially costly device copy of the full data structure. I am a bit 
hesitant to do enable that implicitly, that might be unexpected in some cases? 
(although maybe also convenient ..).
   
   > If so, this begs the question: should there be a more robust mechanism to 
add optional arguments after some producer implementations have already been 
published?
   
   I assume the simple but verbose way to do this is to put the onus on the 
_consumer_: if they want to use a newer keyword, they need to do that in a 
try/except, falling back on the version without the keyword, such that it works 
for producers that support or do not yet support the new keyword.
   
   (we could in theory already add a catch-all `**kwargs` to the protocol 
methods, but that will then silently ignore new keywords if not yet supported 
by a certain producer, so not sure that is better than raising an error)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Expose the device interface through the Arrow PyCapsule protocol [arrow]

Reply via email to