jorisvandenbossche commented on issue #38325: URL: https://github.com/apache/arrow/issues/38325#issuecomment-2020015495
> I do think we should start discussing what does it look like for a CPU-only library to request data from a non-CPU library. Initially I would expect this to raise an error (i.e. by default indeed not allowing cross-device copies). In your example, pandas would check the device, see that it is not CPU, and therefore error that creating a pandas.DataFrame from non-CPU data is not possible. But it's a good point that we at least should consider this case and decide whether we want to support more. If we want to make a cross-device copy possible, the idea is that we let the consumer specify a "requested device type" (like we have a requested schema), so that the producer can do the copy? There might be use cases of enabling it as opt-in. For the example of cudf -> pandas (or -> polars, or duckdb, or any other CPU-only library), if a user actually wants the data to be copied, pandas cannot do this themselves, and it would be cudf that need to perform the copy. So if we want to allow that through this interface, there needs to be a way to signal that. Of course we can (initially) say that this interface doesn't support that. But that does mean that if pandas wants to support ingesting (copying) non-CPU data generically, not tied to a specific library, that's not really possible. Because it would first need do the device-to-host copy using the passed object's APIs (eg for a cudf DataFrame call some cudf-specific method to copy that to CPU memory), losing the benefits of a generic protocol. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
