CurtHagenlocher commented on issue #588: URL: https://github.com/apache/arrow-adbc/issues/588#issuecomment-2800003879
Apologies in advance; this reply is going to be all over the place... It's definitely possible in principle to take C# source and use "AOT" compilation to produce a native dynamic library. The main blocker for doing this with ADBC is a gap in the C# Arrow libraries. This gap is addressed by [https://github.com/apache/arrow/pull/40992](https://github.com/apache/arrow/pull/40992), which I haven't yet had time to analyze. Once that's done, the C# implementations of Spark and Databricks ADBC drivers can be AOT-compiled and consumed using the C APIs. But this underscores the point that not all C# code is compatible with AOT compilation. The Google code we're using in the C# BigQuery implementation, for instance, uses Reflection in a way that is not. Now I'd vaguely expect the latest C# SqlClient implementation to be AOT-compatible due to its relative importance in the ecosystem -- but pretty much the only way to determine this conclusive is to try it. From a perhaps overly-selfish perspective, the only personal benefit I would see from an MSSQL ADBC driver is if it offered better performance than using SqlClient via its ADO.NET API. But because there's no way to build an ADBC driver on top of SqlClient without using its ADO.NET API, there's not actually a way to achieve better performance. And in fact, if the ultimate consumer is more .NET code then it will likely be somewhat worse in that the strings would be converted from UTF-16 to UTF-8 for the Arrow API and then back to UTF-16 for the .NET consumer. (It's possible that this would be an inefficiency no matter how it's implemented if the TDS protocol specifies that "wide" strings are UTF-16.) Now just because *I* don't get any value out of this doesn't mean it's not worthwhile to have it. After all, one of the goals of this project is to establish ADBC as a more widely-used standard, and that requires drivers. But if the driver is strictly worse than what you get with other technologies, then at best it's a stopgap to help bootstrap an ecosystem rather than something valuable in its own right. To that end, I think it would be more interesting to build a generic ADO.NET-to-ADBC wrapper -- just like we have the ADBC-to-ADO.NET wrapper already in the codebase. Such a thing could be used not only with SqlClient but with Oracle's ADO.NET driver, Teradata's ADO.NET driver, the MySql ADO.NET driver, npgsql, etc. It would be the C# equivalent of an ADBC wrapper on top of ODBC, but without all that sketchy C code making people nervous ;). And if the underlying driver happened to support AOT compilation, then so would the combination of the driver and the wrapper. This isn't to say that I'm not interested in an MSSQL ADBC driver; far from it! Like the other row-oriented database protocols, ADO.NET is pretty inefficient in the inner loop. To this, it often adds overhead in terms of boxing and other taxes. But of course, you can't fix that by building on the outside, only by changing the internals. I'd previously considered forking the SqlClient code and changing it so that we can fill Arrow structures directly instead of going through an extra layer. I'd also considered investigating the use of the "tiberius" crate to build a Rust-based ADBC driver. (I both prefer the Rust language to Go and appreciate its lack of a runtime for use in interop scenarios.) But I've also got several lifetimes of side projects I'd like to do and something of a problem staying focused on any one of them, so ... . It's also occurred to me that my "real" goal isn't so much "ADBC everywhere" as it is "Arrow everywhere", and ADBC is more of a means to an end. From that perspective, one could argue that it's sufficient (and probably lower-cost) to modify existing database APIs to support Arrow instead of using a new API. So perhaps the C# Arrow project could define the following: ``` interface IDbArrowCommand { Task<IArrowArrayStream> ExecuteReaderAsync(CancellationToken cancellationToken); Task<IArrowArrayStreams> ExecuteMultipleReaderAsync(CancellationToken cancellationToken); } interface IArrowArrayStreams { Task<IArrowArrayStream> GetNextResult(CancellationToken cancellationToken); } ``` And then SqlClient or any other ADO.NET provider could be modified such that the `DbCommand` implementation also implements `IDbArrowCommand`, and a consumer could check for that support by doing a cast. And at that point, an ADBC wrapper arguably becomes more trivial to implement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org