hi Jacques, Taking a step back from the discussion, the original problem statement was to enable third party projects to produce the data structure used by C++ Array classes in C without depending on the C++ code
That's the ArrayData class here https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L232 It is important for us simplify the programming interface with the C++ library, so I think that we should address this as an endogenous concern of the C++ project, namely providing a "C API for the C++ project". The C API for the C++ library needs to mirror what's in the C++ project (i.e. the ArrayData data structure). We should not advertise this as being a part of the project specification. - Wes On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau <jacq...@apache.org> wrote: > > As I noted on the pull request, I think fundamentally this work is at odds > with the Arrow specification and being used to introduce a shadow > specification. > > I don't think our intentions about how people should use something really > influence how people will actually use or perceive it. They'll just find > supported Arrow code and expose things based on it and call it "Arrow > compatible". In other words, I don't think people in the outside world will > be able to perceive the distinction between "Arrow C++ compatible" and > "Arrow compatible". > > On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi folks, > > > > I just made a comment in https://github.com/apache/arrow/pull/6026 > > that I wanted to surface here on the mailing list. > > > > It seems that to reach consensus for a C interface that is intended to > > be broadly used by multiple programming languages, we may make some > > compromises that harm or outright undermine some of the use cases that > > motivated the creation of the C interface in the first place. That > > does not seem good. I wonder if it would be more productive to reduce > > the scope of the project to merely providing a C-header-based data > > interface to the C++ project only. That was the original problem > > statement and it seems in attempting to make it useful beyond C++ has > > made it difficult to reach consensus. > > > > Thanks > > Wes > > > > On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau <jacq...@apache.org> wrote: > > > > > > Thanks for addressing my comments. I'm actively reviewing the proposal. > > It > > > is taking me more time than I would like given the time of the year but I > > > want to make sure that you know that I'm looking at it and hope to > > provide > > > additional feedback beyond that which I've provided thus far on the PR. > > > Will update soon. > > > > > > Thanks for your patience. > > > > > > On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou <solip...@pitrou.net> > > wrote: > > > > > > > > > > > Hello, > > > > > > > > Following Jacques's feedback, I drafted a new version of the C data > > > > interface spec. > > > > > > > > The spec PR is here: > > > > https://github.com/apache/arrow/pull/6040 > > > > Direct link to the RST file: > > > > > > > > > > https://github.com/apache/arrow/blob/5d8669d371401f9db12326b079e13c0058ba972b/docs/source/format/CDataInterface.rst > > > > > > > > There is also a C++ implementation, together with a Python <-> R > > > > bridge demonstrating the functionality: > > > > https://github.com/apache/arrow/pull/6026 > > > > > > > > The main change from the previous spec is that there are now two C > > > > structures; one for the type or schema information, one for the > > > > array or record batch data. This allows exchanging both kinds of > > > > information independently (and so, potentially, to exchange schema once > > > > and then multiple arrays or record batches). > > > > > > > > Comments and questions welcome. > > > > > > > > Regards > > > > > > > > Antoine. > > > > > > > > > > > > > >