Thanks!

We have already implemented GPU IPC for CUDA:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/gpu/cuda_arrow_ipc.h

Is it possible to use these APIs? If not, what could be changed or added to
allow you to? I don't think it's worthwhile to maintain an alternative
implementation of the IPC protocol in a third party package. The results
can be converted to the C data structure that you listed.

Wes

On Wed, Aug 22, 2018, 4:56 PM Pearu Peterson <pearu.peter...@quansight.com>
wrote:

> Hi Wes,
>
> Yes, sorry for the mess. Here is the message in plain text:
>
> The libgdf project defines a column structure that in a simplified form
> could be represented as
>
> typedef struct {
>     void *data;                          // column data
>     unsigned char *valid;          // validity mask, one bit per column
> item
>     size_t size;                         // nof items
>     enum {INT8, INT16, ...} dtype; // type of column item
>     size_t null_count;               // nof non-valid items
> } my_column_t;
>
> The aim is to implement IPC protocol for sharing my_column_t data between
> host and GPU devices.
>
> What would be the most sensible way to do that using tools available in
> Arrow library?
>
> We are currently considering the following approaches:
>
> 1. Re-using Arrow Array: my_column_t and Arrow Array have one-to-one
> correspondence regarding data content.
>
> 2. Defining new Arrow format MyColumn (using Arrow Tensor as an example):
>
> table MyColumn {
>   /// The type of data contained in a value cell.
>   type: Type;
>   /// The number of non-valid items
>   null_count: long;
>   /// The location and size of the column's data
>   data: Buffer;
>   /// The location and size of the column's mask
>   valid: Buffer;
> }
>
> We are uncertain which approach would be easiest to implement and maintain,
> be efficient (0-copy), or would make sense at all.
>
> Defining Arrow MyColumn seems appealing because of about 7 times less code
> in Arrow Tensor than in Arrow Array. However, Arrow Array includes validity
> mask already.
>
> What do you think?
>
> Best regards,
> Pearu
>
>
> On Wed, Aug 22, 2018 at 11:53 PM, Wes McKinney <wesmck...@gmail.com>
> wrote:
>
> > Hi Pearu,
> >
> > Seems the formatting of your email got messed up a little bit. Can you
> > resend with some more line breaks?
> >
> > Thanks
> >
> >
> > On Wed, Aug 22, 2018, 4:46 PM Pearu Peterson <
> pearu.peter...@quansight.com
> > >
> > wrote:
> >
> > > *Hi,The libgdf project defines a column structure that in a simplified
> > form
> > > could be represented astypedef struct {    void *data;
> > //
> > > column data    unsigned char *valid; // validity mask // one bit per
> > column
> > > item    size_t size;                 // nof items    enum {INT8, INT16,
> > > ...} dtype; // type of column item    size_t null_count;           //
> nof
> > > non-valid items} my_column_t;The aim is to implement IPC protocol for
> > > sharing my_column_t data between host and GPU devices. What would be
> the
> > > most sensible way to do that using tools available in Arrow library?We
> > are
> > > currently considering the following approaches:1. Re-using Arrow Array
> > > (C++): my_column_t and Arrow Array have one-to-one correspondence
> > regarding
> > > data content.2. Defining new Arrow format MyColumn (using Arrow Tensor
> as
> > > an example):table MyColumn {  /// The type of data contained in a value
> > > cell.  type: Type;  /// The number of non-valid items  null_count:
> long;
> > >  /// The location and size of the column's data  data: Buffer;  /// The
> > > location and size of the column's mask  valid: Buffer;}We are uncertain
> > > which approach would be easiest to implement and maintain, be efficient
> > > (0-copy), or would make sense at all.Defining Arrow MyColumn seems
> > > appealing because of about 7 times less code in Arrow Tensor than in
> > Arrow
> > > Array. However, Arrow Array includes validity mask already.What do you
> > > think?Best regards,Pearu*
> > >
> >
>

Reply via email to