zeroshade commented on code in PR #1655: URL: https://github.com/apache/arrow-adbc/pull/1655#discussion_r1538133636
########## docs/source/format/how_manager.rst: ########## @@ -0,0 +1,204 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================================================ +How Drivers and the Driver Manager Work Together +================================================ + +.. note:: This document focuses on drivers/applications using the C API + definitions in adbc.h. That means C/C++/Python/Ruby, and possibly + C#/Go. + +When an application calls a function like +:cpp:func:`AdbcStatementExecuteQuery`, how does it "know" what function in +which driver to actually call? + +This can happen in a few ways. In the simplest case, the application links to +a single driver, and directly calls ADBC functions explicitly defined by the +driver: + +.. figure:: DriverDirectLink.mmd.svg + + In the simplest case, an application directly links to the driver and calls + ADBC functions. + +This doesn't work with multiple drivers, or applications that don't/can't link +directly to drivers (think dynamic loading, perhaps in a language like +Python). For this case, ADBC provides a table of function pointers +(:cpp:struct:`AdbcDriver`), and a way to request this table from a driver. +Then, the application proceeds in two steps. First, it dynamically loads a +driver and calls an entrypoint function to get the function table: + +.. figure:: DriverTableLoad.mmd.svg + + Now, the application asks the driver for a table of functions to call. + +Then, the application uses the driver by calling the functions in the table: + +.. figure:: DriverTableUse.mmd.svg + + The application uses the table to call driver functions. This approach + scales to multiple drivers. + +Dealing with the table, however, is messy. So the overall recommended +approach is to use the ADBC driver manager. This is a library that pretends +to be a single driver that can be linked to and used "like normal". +Internally, it loads the table of function pointers and tracks which +database/connection/statement objects need which "actual" driver, making it +easy to dynamically load drivers at runtime and use multiple drivers from the +same application: + +.. figure:: DriverManagerUse.mmd.svg + + The application uses driver manager to "feel like" it's just using a single + driver. The driver manager handles the details behind the scenes. + +In More Detail +============== + +The adbc.h header ties everything together. It is the abstract API +definition, akin to interface/trait/protocol definitions in other languages. +C being C, however, all it consists of is a bunch of function prototypes and +struct definitions without any implementation. + +A driver, at its core, is just a library that implements those function +prototypes in adbc.h. Those functions may be implemented in C, or they can be +implemented in a different language and exported through language-specific FFI +mechanisms. For example, the Go and C# implementations of ADBC can both +export drivers to consumers who expect the C API definitions. As long as the +definitions in adbc.h are implemented somehow, then the application is +generally none the wiser when it comes to what's actually underneath. + +How does an application call these functions, though? Here, there are several +options. + +Again, the simplest case is as follows: if (1) the application links directly +to the driver, and (2) the driver exposes the ADBC functions *under the same +name* as in adbc.h, then the application can just ``#include <adbc.h>`` and +call ``AdbcStatementExecuteQuery(...)`` directly. Here, the application and +driver have a relationship no different than any other C library. + +.. figure:: DriverDirectLink.mmd.svg + + In the simplest case, an application directly links to the driver and calls + ADBC functions. When the application calls ``StatementExecuteQuery``, that + is directly provided by the driver that it links against. + +Unfortunately, this doesn't work as well in other scenarios. For example, if +an application wishes to use multiple ADBC drivers, this no longer works: both +drivers define the same functions (the ones in adbc.h), and when the +application links both of them, the linker has no way of telling which +driver's function is meant when the application calls an ADBC function. On +top of that, this violates the `One Definition Rule`_. Review Comment: It might end up being an issue in the future that we should keep in mind then at least. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
