lidavidm commented on code in PR #14079:
URL: https://github.com/apache/arrow/pull/14079#discussion_r976814653


##########
docs/source/format/ADBC.rst:
##########
@@ -0,0 +1,307 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=================================
+ADBC: Arrow Database Connectivity
+=================================
+
+Rationale
+=========
+
+The Arrow ecosystem lacks standard database interfaces built around
+Arrow data, especially for efficiently fetching large datasets
+(i.e. with minimal or no serialization and copying).  Without a common
+API, the end result is a mix of custom protocols (e.g. BigQuery,
+Snowflake) and adapters (e.g. Turbodbc_) scattered across languages.
+Consumers must laboriously wrap individual systems (as `DBI is
+contemplating`_ and `Trino does with connectors`_).
+
+ADBC aims to provide a minimal database client API standard, based on
+Arrow, for C, Go, and Java (with bindings for other languages).
+Applications code to this API standard (in much the same way as they
+would with JDBC or ODBC), but fetch result sets in Arrow format
+(e.g. via the :doc:`C Data Interface <./CDataInterface>`).  They then
+link to an implementation of the standard: either directly to a
+vendor-supplied driver for a particular database, or to a driver
+manager that abstracts across multiple drivers.  Drivers implement the
+standard using a database-specific API, such as Flight SQL.
+
+Goals
+-----
+
+- Provide a cross-language, Arrow-based API to standardize how clients
+  submit queries to and fetch Arrow data from databases.
+- Support both SQL dialects and the emergent `Substrait`_ standard.
+- Support explicitly partitioned/distributed result sets to work
+  better with contemporary distributed systems.
+- Allow for a variety of implementations to maximize reach.
+
+Non-goals
+---------
+
+- Replacing JDBC/ODBC in all use cases, particularly `OLTP`_ use
+  cases.
+- Requiring or enshrining a particular database protocol for the Arrow
+  ecosystem.
+
+Example use cases
+-----------------
+
+A C or C++ application wishes to retrieve bulk data from a Postgres
+database for further analysis.  The application is compiled against
+the ADBC header, and executes queries via the ADBC APIs.  The
+application is linked against the ADBC libpq driver.  At runtime, the
+driver submits queries to the database via the Postgres client
+libraries, and retrieves row-oriented results, which it then converts
+to Arrow format before returning them to the application.
+
+If the application wishes to retrieve data from a database supporting
+Flight SQL instead, it would link against the ADBC Flight SQL driver.
+At runtime, the driver would submit queries via Flight SQL and get
+back Arrow data, which is then passed unchanged and uncopied to the
+application.  (The application may have to edit the SQL queries, as
+ADBC does not translate between SQL dialects.)
+
+If the application wishes to work with multiple databases, it would
+link against the ADBC driver manager, and specify the desired driver
+at runtime.  The driver manager would pass on API calls to the correct
+driver, which handles the request.
+
+ADBC API Standard 1.0.0
+=======================
+
+ADBC is a language-specific set of interface definitions that can be
+implemented directly by a vendor-specific "driver" or a vendor-neutral
+"driver manager".
+
+Version 1.0.0 of the standard corresponds to tag adbc-1.0.0 of the
+repository ``apache/arrow-adbc``, which is commit
+7866a566f5b7b635267bfb7a87ea49b01dfe89fa_.  Note that is is separate
+from releases of the actual implementations.
+
+In C, ADBC consists of a self-contained header.  The header is
+reproduced in full at the end of this document, and is intended to be
+self-documenting.
+
+In Go, ADBC consists of a module,
+``github.com/apache/arrow-adbc/go/adbc``, containing interface
+definitions.
+
+In Java, ADBC consists of a package,
+``org.apache.arrow.adbc:adbc-core``, containing interface definitions.
+
+Updating this specification
+===========================
+
+ADBC is versioned separately from the core Arrow project.  The API
+standard and components (driver manager, drivers) are also versioned
+separately, but both follow semantic versioning.
+
+For example: components may make backwards-compatible releases as
+1.0.0, 1.0.1, 1.1.0, 1.2.0, etc.  They may release
+backwards-incompatible versions such as 2.0.0, but which still
+implement the API standard version 1.0.0.
+
+Similarly, this documentation describes the ADBC API standard version
+1.0.0.  If/when an ABI-compatible compatible revision is made
+(e.g. new standard options are defined), the next version would be
+1.1.0.  If incompatible changes are made (e.g. new API functions), the
+next version would be 2.0.0.
+
+Related work
+============
+
+In the initial proposal, a survey of existing solutions and systems
+was included, which is reproduced below for context, though note the
+descriptions are only kept up-to-date on a best-effort basis.
+
+Comparison with Arrow Flight SQL
+--------------------------------
+
+Flight SQL is a **client-server protocol** oriented at database
+developers.  By implementing Flight SQL, a database can support
+clients that use ADBC, JDBC, and ODBC.
+
+ADBC is an **API specification** oriented at database clients.  By
+coding to ADBC, an application can get Arrow data from a variety of
+databases that use different client technologies underneath.
+
+Hence, the two projects complement each other.  While Flight SQL
+provides a client that can be used directly, we expect applications
+would prefer to use ADBC instead of tying themselves to a particular
+dadtabase.

Review Comment:
   Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to