*Hi,The libgdf project defines a column structure that in a simplified form
could be represented astypedef struct {    void *data;                  //
column data    unsigned char *valid; // validity mask // one bit per column
item    size_t size;                 // nof items    enum {INT8, INT16,
...} dtype; // type of column item    size_t null_count;           // nof
non-valid items} my_column_t;The aim is to implement IPC protocol for
sharing my_column_t data between host and GPU devices. What would be the
most sensible way to do that using tools available in Arrow library?We are
currently considering the following approaches:1. Re-using Arrow Array
(C++): my_column_t and Arrow Array have one-to-one correspondence regarding
data content.2. Defining new Arrow format MyColumn (using Arrow Tensor as
an example):table MyColumn {  /// The type of data contained in a value
cell.  type: Type;  /// The number of non-valid items  null_count: long;
 /// The location and size of the column's data  data: Buffer;  /// The
location and size of the column's mask  valid: Buffer;}We are uncertain
which approach would be easiest to implement and maintain, be efficient
(0-copy), or would make sense at all.Defining Arrow MyColumn seems
appealing because of about 7 times less code in Arrow Tensor than in Arrow
Array. However, Arrow Array includes validity mask already.What do you
think?Best regards,Pearu*

Reply via email to