Unfortunately, Go currently can only integrate with C++ libraries through a C interface. There does exist SWIG which is a generator for creating interface code between Go and C++, but ultimately it's just automating the creation of a C interface and Go glue code. Personally I'm not a fan of the code that SWIG generates and haven't had too much luck with it.
I have a working POC of using the datasets API via CGO through a C interface (basically just passing around a uintptr_t which is the address of a heap allocated shared_ptr to a DatasetFactory/Dataset/Scanner and using the C Data interface for passing the resulting record batches through without copying), but couldn't decide on the best way to go about integrating the idea and cleaning it up into a real PR, hence this email thread. I initially was porting the Dataset API to Go, but ran into the fact that it uses the compute expression classes to define things and perform the filtering and realized that it wouldn't be a good idea to try porting the entire compute library. So it just becomes a question as to what level I do the implementation and at what level do I make the calls to a C interface to call into the C++, and then whether or not the interface is a separate component from the existing dataset/compute libraries which can get linked into the Go, optionally as a separate module so that it's not creating a dependency on the C++ libraries for the current arrow Go implementation, only for using the Dataset API stuff (and potentially the compute library). --Matt -----Original Message----- From: Antoine Pitrou <anto...@python.org> Sent: Monday, August 23, 2021 12:56 PM To: dev@arrow.apache.org Subject: Re: [C++][Go] CGO For Dataset API Integration Le 23/08/2021 à 18:22, Matthew Topol a écrit : > That's a fair point, and part of the work I've done so far is a local Go > implementation of at least consuming the C data interface. It will also > eventually involve creating the necessary implementation to produce the > C-Data interface too. But specifically I'm asking for opinions on using that > C-Data interface to build a C *programming* interface to the C++ Dataset API > in the same vein as the JNI interface, so that Go could use the dataset api > without having to reimplement the entirety of it. > > Given the difference between a *programming* interface and a *data* > interface, I suppose the recommendation would be that creating a C > Programming Interface for the Dataset API (using the C-Data interface for > producing/consuming the actual Arrow data) should be a separate component > like libarrow_dataset_jni rather than integrating it directly into the > dataset component. Right? > > If it's not necessary for there to be Go specific things in the interface, > then it could just be called *libarrow_dataset_c* or something equivalent, > but would still be a separate component which just relies on the dataset api > rather than being integrated into it. Does that make sense? That does make sense, though I wonder how usable a C API to datasets would be. Being able to integrate with the C++ API from Go would probably make more sense. Regards Antoine.