Hi Matt, I've posted comments on the PR. Besides:
* The ArrowDeviceArray contains a pointer to an ArrowArray alongside the device information related to allocation. The reason for using a pointer is so that future modifications of the ArrowArray struct do not cause the size of this struct to change (as it would still just be a pointer to the ArrowArray struct).
The ArrowArray struct is not allowed to change, as it would break the ABI: https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification
Remaining Concerns that I can think of: * Alignment and padding of allocations can have a larger impact when dealing with non-cpu devices than with CPUs, and this design provides no way to communicate potential extra padding on a per-buffer basis. We could attempt to codify a convention that allocations should have a specific alignment and a particular padding, but that doesn't actually enforce anything nor allow communicating if for some reason those conventions weren't followed. Should we add some way of passing this info or punt this for a future modification?
How exactly would this be communicated? Is the information actually useful? I got the impression that the CUDA programming model allows you to access exactly the right amount of data, regardless of HW parallelism.
This is part of a wider effort I'm attempting to address to improve the non-cpu memory support in the Arrow libraries, such as enhanced Buffer types in the C++ library that will have the device_id and device_type information in addition to the `is_cpu` flag that currently exists.
The C++ Device class already exists for this. You can get a Buffer's device pretty easily (by going through the MemoryManager, IIRC).
Regards Antoine.