adamreeve commented on PR #41886: URL: https://github.com/apache/arrow/pull/41886#issuecomment-2138628893
I've opened this draft PR to see if anyone has any feedback on the approach so far (@CurtHagenlocher or @kou?). There are still a few issues to sort out before this could be considered ready though. I've taken the approach of using [gir.core](https://github.com/gircore/gir.core/) to generate C# code based on the GIR files that are created in the `c_glib` build. Alternatively we could use [GtkSharp](https://github.com/GtkSharp/GtkSharp), which is more mature but less active now and only really focused on building Gtk bindings, whereas gir.core describes itself as "A C# binding generator for GObject based libraries providing a C# friendly API surface", and says it "Allows 3rd party developers to write bindings for other GObject-based libraries." GtkSharp also has its own custom XML format that's different to gir. I haven't had much luck using their parser with the c_glib code but it might be possible to convert to their format from gir files. Another possible approach would be to manually write code to call into the C libraries, which would be a lot more labour intensive. There are two GObject based projects that get generated, one for the core Arrow library and one for the Dataset library, and I've created a higher level Dataset library that doesn't directly expose the GObject classes. This adds more work compared to just providing the GObject library and adding additional methods to the GObject based classes to add extra functionality like in the Ruby library, but I think it's nicer this way to avoid having a mix of GObject and non-GObject based classes in the same API, eg. one method to get a GObject `RecordBatchReader` and another to get an `Apache.Arrow.IArrowArrayStream`, or having to first get the GObject `RecordBatchReader` and then import it as an `IArrowArrayStream`. I'm still on the fence about this approach though, and this is also a bit awkward for types in the core Arrow GLib library that don't have a corresponding type in the main .NET Arrow library, like the FileSystem class. I've created a FileSystem wrapper class in the high level Dataset library but this could cause issues if we later want to use a FileSystem in another GObject based library. Remaining issues: * Building in CI: gir.core needs separate Linux, Windows and MacOS gir files. For local development I just generate a single file but for testing and releasing in CI we'll probably want to generate per-OS files in separate jobs and then combine them. gir.core is designed to work with gir files that are checked in to source control, but I don't think that makes sense for us when the c_glib and C# code live in the same repository. * Generating .gir files is failing with MSVC. I have the c_glib libraries building after #41134, but generating the gir and typelib files doesn't work yet and fails with linker errors (eg. see https://github.com/adamreeve/arrow/actions/runs/9261584329/job/25479186007) * The gir.core GObject and GLib libraries are not signed/strong named. I've had to add `<SignAssembly>false</SignAssembly>` to the projects that depend on these for now. I'm not sure whether this should be a blocker, and we could ask for signed versions of the packages to be published or publish our own bindings to these that are signed if needed. * gir.core doesn't have a way to generate methods with "new", "virtual" or "override" modifiers. There are a lot of methods in the c_glib libraries that exist in both parent and derived classes and I've patched the generated code to add modifiers to fix this. There are also a few "ToString" methods that hide `object.ToString` that I've had to add "override" to. I'm not sure whether this should be supported by gir.core or maybe we just want a more robust way to fix these with post processing. * gir.core is used as a git submodule. It would be nice if this was available as a tool that can be installed via NuGet, but this is probably not a blocker. * The code generation is currently handled by a bash script. Integrating this into MSBuild or as a source generator would be nicer, although gir.core generates code for multiple projects at once so this could be a bit fiddly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
