To add to this, take a look at the C interface functions in pyarrow

Reconstruct pyarrow.DataType from C ArrowSchema

https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203

Reconstruct pyarrow.Array from C ArrowArray

https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176

The idea is that a single ArrowSchema may correspond to a sequence of
ArrowArray, so the data type (equivalently schema) is represented
separately from the array data.

You can see examples of both of these in the unit tests (which use
cffi to create the C structs)

https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py

If you're having trouble getting things to work, it would be helpful
if you could show what data exactly you are putting into the C
structures and how it is not returning the expected result when
imported into pyarrow.

On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson
<neal.p.richard...@gmail.com> wrote:
>
> Hi Anish,
> You may be interested in how the Arrow R package uses the C interface to
> pass data to/from pyarrow. Both sides use the Arrow C++ library's
> implementation of the C interface. See
> https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++
> implementation is in
> https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
>
> Neal
>
> On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas <anishbiswas...@gmail.com>
> wrote:
>
> > I have been trying to wrap my head around the[ CDataInterface.rst|
> >
> > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > ]
> > document for a few days now. So what I am trying is basically to use the C
> > interface with a minimum dependencies to produce blocks of bytes that
> > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > vice-versa: both directions).
> >
> > Here's what I already tried doing.
> >
> >    - Created a C library that contains the two structs ArrowSchema and
> >    ArrowArray and some functions to export an int64_t array as an Arrow
> > Array.
> >    This is very similar to what the document did with int32_t arrays.
> >    - Imported the C library in Python. Created an int64_t pyarrow.array.
> >    Serialized it to read the bytes via Numpy and populated the C struct I
> >    created using the C library function.
> >
> > What I expected was that the bytes would have some resemblance to each
> > other and that pyarrow would have some utility to pick up the ArrowArray
> > struct and treat it as an Arrow Array. But I couldn't get it to work.
> >
> > I am also confused as to how do I use ArrowSchema properly. The
> > ArrowSchema is
> > the only structure that differentiates different ArrowArray formats.
> > However, the fact that I am not using it anywhere with the ArrowArray
> > struct
> > or for that matter for any kind of initialization which tells the Arrow
> > library that "The next structure you will encounter would be of the kind
> > that the ArrowSchema has provided you", doesn't seem correct to me.
> >
> > It would really help me out, if you could tell if I actually misinterpreted
> > the doc, or am I doing something wrong. Thanks!
> >

Reply via email to