Re: [DISCUSSION] C-Data API for Non-CPU Use Cases

Matt Topol Mon, 10 Apr 2023 12:51:36 -0700

> There's nothing in the spec today that
prevents users from creating `ArrowDeviceArray` and
`ArrowDeviceArrayStream` themselves


True, but third-party applications aren't going to be the only downstream
users of this API. We also want to build on this within Arrow itself to
enable easier usage of non-CPU memory in the higher-level interfaces (IPC,
Flight). Right now there are the buffer and memory manager classes, but
there are plenty of areas that still inspect the buffers instead of first
checking the `is_cpu` flag on them first. Plus we would want to be able to
expose the device_type and device_id information at those higher level
interfaces too.

Even if we don't standardize on the device type list from dlpack, having a
standardized way for libraries to pass this device information alongside
the Arrow Arrays themselves is still beneficial and I think it's better for
us to define it rather than wait for others to do so. The next step after
agreeing on this format change is going to be building helpers (similar to
the existing C Data helpers) around this to ensure safe usage and
conversion to C++ Arrow Array's and buffers, etc.

> Would we rather come up with a mechanism for arbitrary key/value metadata
(e.g. UTF-8 encoded JSON string) to accompany arrays?  Something similar to
record batch metadata in the IPC format?

A generic key/value metadata layer at the Array level for the C-Data API
could work in most cases *except* for the change to `get_next` for passing
the stream/queue pointer object that the producer needs to make the data
safe to access on. If a need for generic metadata at the array level IS
needed in the future, the reserved 128 bytes could be easily evolved into a
const char* + a release callback or whatever else we need in order to
provide other generic metadata in the struct for the Array.

My concerns with using a generic metadata layer to pass this information
are:
1. There would be no strong type enforcement on the device_id/device_type
members, which would require additional error checking on consumers for
invalid values.
2. We'd need to add a release callback or something to the ArrowDeviceArray
struct itself in order to clarify who owns the `const char*` for the
metadata.

> 1. Agreeing that dlpack has a good list of device ids
  * how is this list managed?  how hard is it for a device to get added to
the list?
  * how will updates to dlpack be mirrored into arrow?
  * instead of just repeating/vendoring the enum can we simply refer to it
and treat this as an opaque integer?)

I'm not opposed to just referring to the dlpack enum and treating this as
an opaque integer if that would be preferable. I definitely agree with the
difficulties in vendoring/repeating the dlpack enum values here and
ensuring it stays up to date. Does anyone else have strong feelings one way
or the other on this?

--Matt

On Mon, Apr 10, 2023 at 1:34 PM Weston Pace <weston.p...@gmail.com> wrote:

> I suppose I'm a weak +1 in "I don't entirely believe this will be useful
> but it doesn't seem harmful".  There's nothing in the spec today that
> prevents users from creating `ArrowDeviceArray` and
> `ArrowDeviceArrayStream` themselves and I'm not familiar enough with the
> systems producing / consuming these arrays to know if it makes sense for
> Arrow to standardize on a list.  E.g. picking dlpack's list of device ids.
>
> Arrays and streams could also be labeled with (just brainstorming):
>  * numa node number
>  * worker id (e.g. in distributed execution)
>  * thread id (e.g. in some kind of work-stealing producer/consumer
> algorithm)
>  * source device MAC (e.g. in some kind of IoT environment)
>  * etc.
>
> Would we rather come up with a mechanism for arbitrary key/value metadata
> (e.g. UTF-8 encoded JSON string) to accompany arrays?  Something similar to
> record batch metadata in the IPC format?
>
> Otherwise, it seems what is proposed is:
>
> 1. Agreeing that dlpack has a good list of device ids
>   * how is this list managed?  how hard is it for a device to get added to
> the list?
>   * how will updates to dlpack be mirrored into arrow?
>   * instead of just repeating/vendoring the enum can we simply refer to it
> and treat this as an opaque integer?)
> 2. Providing an example of how you can tag arrays with metadata
>
>
>
> On Mon, Apr 10, 2023 at 9:30 AM Matt Topol <zotthewiz...@gmail.com> wrote:
>
> > > The ArrowArray struct is not allowed to change, as it would break the
> > ABI:
> >
> >
> https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification
> >
> > I was referring more to the future case where we might need to introduce
> an
> > `ArrowArrayV2` or something similar precisely because the ArrowArray
> struct
> > is not allowed to change to avoid breaking the ABI. The problem being
> that
> > it becomes difficult to communicate *per-buffer* device info if we are
> > locked into only the existing ArrowArray struct which doesn't provide any
> > way to communicate that information. That said, you did bring up a good
> > point in the PR of the question of ownership of that memory. So it might
> > make more sense to just embed it and if we have to evolve the ABI we can
> > use the reserved bytes.
> >
> > > How exactly would this be communicated? Is the information actually
> > useful? I got the impression that the CUDA programming model allows you
> > to access exactly the right amount of data, regardless of HW parallelism.
> >
> > From my research and discussions, it looks like there's still the case of
> > handling vectorization in GPU programming. If you know the padding of the
> > buffer you know whether or not it's safe to use bytes after the length of
> > the data for the purposes of having equal length buffers that you can
> > perform vectorized operations on (similar to what you might do for CPU
> > programming). For now I've left this out and punted on it as it would be
> > per-buffer information that we aren't communicating. But I wanted to
> > mention the concern in case anyone on the mailing list that is more
> > knowledgeable than I (or has particular use cases in mind) has an opinion
> > on this and thinks we should add this info to be communicated somehow
> > (maybe another struct that we embed, or otherwise manage in the
> > ArrowDeviceArray struct).
> >
> > > The C++ Device class already exists for this. You can get a Buffer's
> > device pretty easily (by going through the MemoryManager, IIRC).
> >
> > I completely missed the Device class, good point! So when I get to
> > implementing helpers for the C++ lib for these format changes, I'd likely
> > create an implementation of that Device class / MemoryManager class for
> > handling devices described by the ArrowDeviceArray's information of the
> > DeviceType + Device ID. Alongside a child class from Buffer for managing
> > the buffers and making sure they return false for `is_cpu`.
> >
> > -Matt
> >
> > On Sat, Apr 8, 2023 at 4:41 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> > >
> > > Hi Matt,
> > >
> > > I've posted comments on the PR. Besides:
> > >
> > > >     * The ArrowDeviceArray contains a pointer to an ArrowArray
> > alongside
> > > > the device information related to allocation. The reason for using a
> > > > pointer is so that future modifications of the ArrowArray struct do
> not
> > > > cause the size of this struct to change (as it would still just be a
> > > > pointer to the ArrowArray struct).
> > >
> > > The ArrowArray struct is not allowed to change, as it would break the
> > ABI:
> > >
> > >
> >
> https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification
> > >
> > > > Remaining Concerns that I can think of:
> > > >      * Alignment and padding of allocations can have a larger impact
> > when
> > > > dealing with non-cpu devices than with CPUs, and this design provides
> > no
> > > > way to communicate potential extra padding on a per-buffer basis. We
> > > could
> > > > attempt to codify a convention that allocations should have a
> specific
> > > > alignment and a particular padding, but that doesn't actually enforce
> > > > anything nor allow communicating if for some reason those conventions
> > > > weren't followed. Should we add some way of passing this info or punt
> > > this
> > > > for a future modification?
> > >
> > > How exactly would this be communicated? Is the information actually
> > > useful? I got the impression that the CUDA programming model allows you
> > > to access exactly the right amount of data, regardless of HW
> parallelism.
> > >
> > > > This is part of a wider effort I'm attempting to address to
> > > > improve the non-cpu memory support in the Arrow libraries, such as
> > > enhanced
> > > > Buffer types in the C++ library that will have the device_id and
> > > > device_type information in addition to the `is_cpu` flag that
> currently
> > > > exists.
> > >
> > > The C++ Device class already exists for this. You can get a Buffer's
> > > device pretty easily (by going through the MemoryManager, IIRC).
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> >
>

Re: [DISCUSSION] C-Data API for Non-CPU Use Cases

Reply via email to