Hi Wes, Yes, sorry for the mess. Here is the message in plain text:
The libgdf project defines a column structure that in a simplified form could be represented as typedef struct { void *data; // column data unsigned char *valid; // validity mask, one bit per column item size_t size; // nof items enum {INT8, INT16, ...} dtype; // type of column item size_t null_count; // nof non-valid items } my_column_t; The aim is to implement IPC protocol for sharing my_column_t data between host and GPU devices. What would be the most sensible way to do that using tools available in Arrow library? We are currently considering the following approaches: 1. Re-using Arrow Array: my_column_t and Arrow Array have one-to-one correspondence regarding data content. 2. Defining new Arrow format MyColumn (using Arrow Tensor as an example): table MyColumn { /// The type of data contained in a value cell. type: Type; /// The number of non-valid items null_count: long; /// The location and size of the column's data data: Buffer; /// The location and size of the column's mask valid: Buffer; } We are uncertain which approach would be easiest to implement and maintain, be efficient (0-copy), or would make sense at all. Defining Arrow MyColumn seems appealing because of about 7 times less code in Arrow Tensor than in Arrow Array. However, Arrow Array includes validity mask already. What do you think? Best regards, Pearu On Wed, Aug 22, 2018 at 11:53 PM, Wes McKinney <wesmck...@gmail.com> wrote: > Hi Pearu, > > Seems the formatting of your email got messed up a little bit. Can you > resend with some more line breaks? > > Thanks > > > On Wed, Aug 22, 2018, 4:46 PM Pearu Peterson <pearu.peter...@quansight.com > > > wrote: > > > *Hi,The libgdf project defines a column structure that in a simplified > form > > could be represented astypedef struct { void *data; > // > > column data unsigned char *valid; // validity mask // one bit per > column > > item size_t size; // nof items enum {INT8, INT16, > > ...} dtype; // type of column item size_t null_count; // nof > > non-valid items} my_column_t;The aim is to implement IPC protocol for > > sharing my_column_t data between host and GPU devices. What would be the > > most sensible way to do that using tools available in Arrow library?We > are > > currently considering the following approaches:1. Re-using Arrow Array > > (C++): my_column_t and Arrow Array have one-to-one correspondence > regarding > > data content.2. Defining new Arrow format MyColumn (using Arrow Tensor as > > an example):table MyColumn { /// The type of data contained in a value > > cell. type: Type; /// The number of non-valid items null_count: long; > > /// The location and size of the column's data data: Buffer; /// The > > location and size of the column's mask valid: Buffer;}We are uncertain > > which approach would be easiest to implement and maintain, be efficient > > (0-copy), or would make sense at all.Defining Arrow MyColumn seems > > appealing because of about 7 times less code in Arrow Tensor than in > Arrow > > Array. However, Arrow Array includes validity mask already.What do you > > think?Best regards,Pearu* > > >