Re: [I] [Python] Expose the device interface through the Arrow PyCapsule protocol [arrow]

via GitHub Tue, 26 Mar 2024 03:08:23 -0700


jorisvandenbossche commented on issue #38325:
URL: https://github.com/apache/arrow/issues/38325#issuecomment-2020015495


   > I do think we should start discussing what does it look like for a 
CPU-only library to request data from a non-CPU library.
   
   Initially I would expect this to raise an error (i.e. by default indeed not 
allowing cross-device copies). In your example, pandas would check the device, 
see that it is not CPU, and therefore error that creating a pandas.DataFrame 
from non-CPU data is not possible. 
   
   But it's a good point that we at least should consider this case and decide 
whether we want to support more.
   If we want to make a cross-device copy possible, the idea is that we let the 
consumer specify a "requested device type" (like we have a requested schema), 
so that the producer can do the copy?
   
   There might be use cases of enabling it as opt-in. For the example of cudf 
-> pandas (or -> polars, or duckdb, or any other CPU-only library), if a user 
actually wants the data to be copied, pandas cannot do this themselves, and it 
would be cudf that need to perform the copy. So if we want to allow that 
through this interface, there needs to be a way to signal that. 
   
   Of course we can (initially) say that this interface doesn't support that. 
But that does mean that if pandas wants to support ingesting (copying) non-CPU 
data generically, not tied to a specific library, that's not really possible. 
Because it would first need do the device-to-host copy using the passed 
object's APIs (eg for a cudf DataFrame call some cudf-specific method to copy 
that to CPU memory), losing the benefits of a generic protocol.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Expose the device interface through the Arrow PyCapsule protocol [arrow]

Reply via email to