adamreeve commented on PR #41886:
URL: https://github.com/apache/arrow/pull/41886#issuecomment-2138628893

   I've opened this draft PR to see if anyone has any feedback on the approach 
so far (@CurtHagenlocher or @kou?). There are still a few issues to sort out 
before this could be considered ready though.
   
   I've taken the approach of using 
[gir.core](https://github.com/gircore/gir.core/) to generate C# code based on 
the GIR files that are created in the `c_glib` build. Alternatively we could 
use [GtkSharp](https://github.com/GtkSharp/GtkSharp), which is more mature but 
less active now and only really focused on building Gtk bindings, whereas 
gir.core describes itself as "A C# binding generator for GObject based 
libraries providing a C# friendly API surface", and says it "Allows 3rd party 
developers to write bindings for other GObject-based libraries." GtkSharp also 
has its own custom XML format that's different to gir. I haven't had much luck 
using their parser with the c_glib code but it might be possible to convert to 
their format from gir files. Another possible approach would be to manually 
write code to call into the C libraries, which would be a lot more labour 
intensive.
   
   There are two GObject based projects that get generated, one for the core 
Arrow library and one for the Dataset library, and I've created a higher level 
Dataset library that doesn't directly expose the GObject classes. This adds 
more work compared to just providing the GObject library and adding additional 
methods to the GObject based classes to add extra functionality like in the 
Ruby library, but I think it's nicer this way to avoid having a mix of GObject 
and non-GObject based classes in the same API, eg. one method to get a GObject 
`RecordBatchReader` and another to get an `Apache.Arrow.IArrowArrayStream`, or 
having to first get the GObject `RecordBatchReader` and then import it as an 
`IArrowArrayStream`.
   
   I'm still on the fence about this approach though, and this is also a bit 
awkward for types in the core Arrow GLib library that don't have a 
corresponding type in the main .NET Arrow library, like the FileSystem class. 
I've created a FileSystem wrapper class in the high level Dataset library but 
this could cause issues if we later want to use a FileSystem in another GObject 
based library.
   
   Remaining issues:
   
   * Building in CI: gir.core needs separate Linux, Windows and MacOS gir 
files. For local development I just generate a single file but for testing and 
releasing in CI we'll probably want to generate per-OS files in separate jobs 
and then combine them. gir.core is designed to work with gir files that are 
checked in to source control, but I don't think that makes sense for us when 
the c_glib and C# code live in the same repository.
   * Generating .gir files is failing with MSVC. I have the c_glib libraries 
building after #41134, but generating the gir and typelib files doesn't work 
yet and fails with linker errors (eg. see 
https://github.com/adamreeve/arrow/actions/runs/9261584329/job/25479186007)
   * The gir.core GObject and GLib libraries are not signed/strong named. I've 
had to add `<SignAssembly>false</SignAssembly>` to the projects that depend on 
these for now. I'm not sure whether this should be a blocker, and we could ask 
for signed versions of the packages to be published or publish our own bindings 
to these that are signed if needed.
   * gir.core doesn't have a way to generate methods with "new", "virtual" or 
"override" modifiers. There are a lot of methods in the c_glib libraries that 
exist in both parent and derived classes and I've patched the generated code to 
add modifiers to fix this. There are also a few "ToString" methods that hide 
`object.ToString` that I've had to add "override" to. I'm not sure whether this 
should be supported by gir.core or maybe we just want a more robust way to fix 
these with post processing.
   * gir.core is used as a git submodule. It would be nice if this was 
available as a tool that can be installed via NuGet, but this is probably not a 
blocker.
   * The code generation is currently handled by a bash script. Integrating 
this into MSBuild or as a source generator would be nicer, although gir.core 
generates code for multiple projects at once so this could be a bit fiddly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to