Re: [I] MSSQL support [arrow-adbc]

via GitHub Sun, 13 Apr 2025 08:45:53 -0700


CurtHagenlocher commented on issue #588:
URL: https://github.com/apache/arrow-adbc/issues/588#issuecomment-2800003879


   Apologies in advance; this reply is going to be all over the place...
   
   It's definitely possible in principle to take C# source and use "AOT" 
compilation to produce a native dynamic library. The main blocker for doing 
this with ADBC is a gap in the C# Arrow libraries. This gap is addressed by 
[https://github.com/apache/arrow/pull/40992](https://github.com/apache/arrow/pull/40992),
 which I haven't yet had time to analyze. Once that's done, the C# 
implementations of Spark and Databricks ADBC drivers can be AOT-compiled and 
consumed using the C APIs.
   
   But this underscores the point that not all C# code is compatible with AOT 
compilation. The Google code we're using in the C# BigQuery implementation, for 
instance, uses Reflection in a way that is not. Now I'd vaguely expect the 
latest C# SqlClient implementation to be AOT-compatible due to its relative 
importance in the ecosystem -- but pretty much the only way to determine this 
conclusive is to try it.
   
   From a perhaps overly-selfish perspective, the only personal benefit I would 
see from an MSSQL ADBC driver is if it offered better performance than using 
SqlClient via its ADO.NET API. But because there's no way to build an ADBC 
driver on top of SqlClient without using its ADO.NET API, there's not actually 
a way to achieve better performance. And in fact, if the ultimate consumer is 
more .NET code then it will likely be somewhat worse in that the strings would 
be converted from UTF-16 to UTF-8 for the Arrow API and then back to UTF-16 for 
the .NET consumer. (It's possible that this would be an inefficiency no matter 
how it's implemented if the TDS protocol specifies that "wide" strings are 
UTF-16.)
   
   Now just because *I* don't get any value out of this doesn't mean it's not 
worthwhile to have it. After all, one of the goals of this project is to 
establish ADBC as a more widely-used standard, and that requires drivers. But 
if the driver is strictly worse than what you get with other technologies, then 
at best it's a stopgap to help bootstrap an ecosystem rather than something 
valuable in its own right.
   
   To that end, I think it would be more interesting to build a generic 
ADO.NET-to-ADBC wrapper -- just like we have the ADBC-to-ADO.NET wrapper 
already in the codebase. Such a thing could be used not only with SqlClient but 
with Oracle's ADO.NET driver, Teradata's ADO.NET driver, the MySql ADO.NET 
driver, npgsql, etc. It would be the C# equivalent of an ADBC wrapper on top of 
ODBC, but without all that sketchy C code making people nervous ;). And if the 
underlying driver happened to support AOT compilation, then so would the 
combination of the driver and the wrapper.
   
   This isn't to say that I'm not interested in an MSSQL ADBC driver; far from 
it! Like the other row-oriented database protocols, ADO.NET is pretty 
inefficient in the inner loop. To this, it often adds overhead in terms of 
boxing and other taxes. But of course, you can't fix that by building on the 
outside, only by changing the internals. I'd previously considered forking the 
SqlClient code and changing it so that we can fill Arrow structures directly 
instead of going through an extra layer. I'd also considered investigating the 
use of the "tiberius" crate to build a Rust-based ADBC driver. (I both prefer 
the Rust language to Go and appreciate its lack of a runtime for use in interop 
scenarios.) But I've also got several lifetimes of side projects I'd like to do 
and something of a problem staying focused on any one of them, so ... .
   
   It's also occurred to me that my "real" goal isn't so much "ADBC everywhere" 
as it is "Arrow everywhere", and ADBC is more of a means to an end. From that 
perspective, one could argue that it's sufficient (and probably lower-cost) to 
modify existing database APIs to support Arrow instead of using a new API. So 
perhaps the C# Arrow project could define the following:
   ```
   interface IDbArrowCommand
   {
       Task<IArrowArrayStream> ExecuteReaderAsync(CancellationToken 
cancellationToken);
       Task<IArrowArrayStreams> ExecuteMultipleReaderAsync(CancellationToken 
cancellationToken);
   }
   
   interface IArrowArrayStreams
   {
       Task<IArrowArrayStream> GetNextResult(CancellationToken 
cancellationToken);
   }
   ```
   And then SqlClient or any other ADO.NET provider could be modified such that 
the `DbCommand` implementation also implements `IDbArrowCommand`, and a 
consumer could check for that support by doing a cast. And at that point, an 
ADBC wrapper arguably becomes more trivial to implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] MSSQL support [arrow-adbc]

Reply via email to