Re: [DISCUSS] C Data Interface, take 2

Wes McKinney Mon, 20 Jan 2020 09:59:50 -0800

hi Jacques,

Taking a step back from the discussion, the original problem statement
was to enable third party projects to produce the data structure used
by C++ Array classes in C without depending on the C++ code


That's the ArrayData class here

https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L232

It is important for us simplify the programming interface with the C++
library, so I think that we should address this as an endogenous
concern of the C++ project, namely providing a "C API for the C++
project". The C API for the C++ library needs to mirror what's in the
C++ project (i.e. the ArrayData data structure). We should not
advertise this as being a part of the project specification.

- Wes

On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau <jacq...@apache.org> wrote:
>
> As I noted on the pull request, I think fundamentally this work is at odds
> with the Arrow specification and being used to introduce a shadow
> specification.
>
> I don't think our intentions about how people should use something really
> influence how people will actually use or perceive it. They'll just find
> supported Arrow code and expose things based on it and call it "Arrow
> compatible". In other words, I don't think people in the outside world will
> be able to perceive the distinction between "Arrow C++ compatible" and
> "Arrow compatible".
>
> On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi folks,
> >
> > I just made a comment in https://github.com/apache/arrow/pull/6026
> > that I wanted to surface here on the mailing list.
> >
> > It seems that to reach consensus for a C interface that is intended to
> > be broadly used by multiple programming languages, we may make some
> > compromises that harm or outright undermine some of the use cases that
> > motivated the creation of the C interface in the first place. That
> > does not seem good. I wonder if it would be more productive to reduce
> > the scope of the project to merely providing a C-header-based data
> > interface to the C++ project only. That was the original problem
> > statement and it seems in attempting to make it useful beyond C++ has
> > made it difficult to reach consensus.
> >
> > Thanks
> > Wes
> >
> > On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau <jacq...@apache.org> wrote:
> > >
> > > Thanks for addressing my comments. I'm actively reviewing the proposal.
> > It
> > > is taking me more time than I would like given the time of the year but I
> > > want to make sure that you know that I'm looking at it and hope to
> > provide
> > > additional feedback beyond that which I've provided thus far on the PR.
> > > Will update soon.
> > >
> > > Thanks for your patience.
> > >
> > > On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou <solip...@pitrou.net>
> > wrote:
> > >
> > > >
> > > > Hello,
> > > >
> > > > Following Jacques's feedback, I drafted a new version of the C data
> > > > interface spec.
> > > >
> > > > The spec PR is here:
> > > > https://github.com/apache/arrow/pull/6040
> > > > Direct link to the RST file:
> > > >
> > > >
> > https://github.com/apache/arrow/blob/5d8669d371401f9db12326b079e13c0058ba972b/docs/source/format/CDataInterface.rst
> > > >
> > > > There is also a C++ implementation, together with a Python <-> R
> > > > bridge demonstrating the functionality:
> > > > https://github.com/apache/arrow/pull/6026
> > > >
> > > > The main change from the previous spec is that there are now two C
> > > > structures; one for the type or schema information, one for the
> > > > array or record batch data. This allows exchanging both kinds of
> > > > information independently (and so, potentially, to exchange schema once
> > > > and then multiple arrays or record batches).
> > > >
> > > > Comments and questions welcome.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > >
> >

Re: [DISCUSS] C Data Interface, take 2

Reply via email to