[GitHub] [arrow-adbc] zeroshade commented on a diff in pull request #504: docs: update README and add FAQ

via GitHub Thu, 09 Mar 2023 09:26:56 -0800


zeroshade commented on code in PR #504:
URL: https://github.com/apache/arrow-adbc/pull/504#discussion_r1131345676



##########
README.md:
##########
@@ -22,32 +22,41 @@
 
[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/apache/arrow-adbc/blob/master/LICENSE.txt)
 
 ADBC is an API standard (version 1.0.0) for database access libraries 
("drivers") in C, Go, and Java that uses Arrow for result sets and query 
parameters.
-Instead of writing code for each individual database, applications can build 
against the ADBC APIs, and link against drivers that implement the standard.
+Instead of writing code to extract Arrow data out of each individual database, 
applications can build against the ADBC APIs, and link against drivers that 
implement the standard.

Review Comment:
   should we mention not just extracting but mention ingestion here too?



##########
README.md:
##########
@@ -22,32 +22,41 @@
 
[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/apache/arrow-adbc/blob/master/LICENSE.txt)
 
 ADBC is an API standard (version 1.0.0) for database access libraries 
("drivers") in C, Go, and Java that uses Arrow for result sets and query 
parameters.
-Instead of writing code for each individual database, applications can build 
against the ADBC APIs, and link against drivers that implement the standard.
+Instead of writing code to extract Arrow data out of each individual database, 
applications can build against the ADBC APIs, and link against drivers that 
implement the standard.
 Additionally, a JDBC/ODBC-style driver manager is provided. This also 
implements the ADBC APIs, but dynamically loads drivers and dispatches calls to 
them.
 
-Like JDBC/ODBC, the goal is to provide a generic API for multiple databases, 
but ADBC is focused on Arrow-based data access for analytics use cases (bulk 
data retrieval/ingestion), and not the full spectrum of use cases that 
JDBC/ODBC drivers handle.
+Like JDBC/ODBC, the goal is to provide a generic API for multiple databases.
+ADBC, however, focuses on Arrow-based data access for analytics use cases - 
primarily bulk data retrieval and ingestion, and not the full spectrum of use 
cases that JDBC/ODBC support.

Review Comment:
   Maybe this?
   
   ```suggestion
   Like JDBC/ODBC, the goal is to provide a generic API for multiple databases. 
ADBC, however, is focused on bulk columnar data retrieval and ingestion through 
an Arrow-based API rather than attempting to replace JDBC/ODBC in all use cases.
   ```



##########
docs/source/faq.rst:
##########
@@ -0,0 +1,148 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+================================
+Frequently Asked Questions (FAQ)
+================================
+
+What exactly is ADBC?
+=====================
+
+ADBC is:
+
+- A set of abstract APIs in different languages (C/C++, Go, and Java)
+  for working with databases and Arrow data.
+
+  For example, result sets of queries in ADBC are all returned as
+  streams of Arrow data, not row-by-row.
+- A set of implementations of that API in different languages (C/C++,
+  Go, Java, Python, and Ruby) that target different databases
+  (e.g. PostgreSQL, SQLite, any database supporting Flight SQL).
+
+Why not just use JDBC/ODBC?
+===========================
+
+JDBC uses row-based interfaces like `ResultSet`_.  When working with
+columnar data, like Arrow data, this means that we have to convert the
+data at least once and possibly twice:
+
+- Once (possibly) in the driver or database, to take columnar data and
+  convert it into a row-based format so it can be returned through the
+  JDBC APIs.
+- Once (always) when a client application pulls data from the JDBC
+  API, to convert the rows into columns.
+
+In keeping with Arrow's "zero-copy" or "minimal-copy" ethos, we would
+like to avoid these unnecessary conversions.
+
+ODBC is in a similar situation.  Although ODBC does support
+`"column-wise binding"`_, not all ODBC drivers support it, and it is
+more complex to use.  Additionally, ODBC uses caller-allocated buffers
+(which often means forcing a data copy), and ODBC specifies data
+layouts that are not quite Arrow-compatible (requiring a data
+conversion anyways).
+
+.. _ResultSet: 
https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html
+.. _"column-wise binding": 
https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/column-wise-binding?view=sql-server-ver16
+
+How do ADBC and Arrow Flight SQL differ?
+========================================
+
+ADBC is an *API abstraction*.  It doesn't specify what goes on between

Review Comment:
   *Client* API _specification_?



##########
docs/source/faq.rst:
##########
@@ -0,0 +1,148 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+================================
+Frequently Asked Questions (FAQ)
+================================
+
+What exactly is ADBC?
+=====================
+
+ADBC is:
+
+- A set of abstract APIs in different languages (C/C++, Go, and Java)

Review Comment:
   if we're going to mention C/C++, Go and Java should we also mention the 
other languages that are being worked on and are in progress (Rust)?



##########
docs/source/faq.rst:
##########
@@ -0,0 +1,148 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+================================
+Frequently Asked Questions (FAQ)
+================================
+
+What exactly is ADBC?
+=====================
+
+ADBC is:
+
+- A set of abstract APIs in different languages (C/C++, Go, and Java)
+  for working with databases and Arrow data.
+
+  For example, result sets of queries in ADBC are all returned as
+  streams of Arrow data, not row-by-row.
+- A set of implementations of that API in different languages (C/C++,
+  Go, Java, Python, and Ruby) that target different databases
+  (e.g. PostgreSQL, SQLite, any database supporting Flight SQL).
+
+Why not just use JDBC/ODBC?
+===========================
+
+JDBC uses row-based interfaces like `ResultSet`_.  When working with
+columnar data, like Arrow data, this means that we have to convert the
+data at least once and possibly twice:
+
+- Once (possibly) in the driver or database, to take columnar data and
+  convert it into a row-based format so it can be returned through the
+  JDBC APIs.
+- Once (always) when a client application pulls data from the JDBC
+  API, to convert the rows into columns.
+
+In keeping with Arrow's "zero-copy" or "minimal-copy" ethos, we would
+like to avoid these unnecessary conversions.
+
+ODBC is in a similar situation.  Although ODBC does support
+`"column-wise binding"`_, not all ODBC drivers support it, and it is
+more complex to use.  Additionally, ODBC uses caller-allocated buffers
+(which often means forcing a data copy), and ODBC specifies data
+layouts that are not quite Arrow-compatible (requiring a data
+conversion anyways).
+
+.. _ResultSet: 
https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html
+.. _"column-wise binding": 
https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/column-wise-binding?view=sql-server-ver16
+
+How do ADBC and Arrow Flight SQL differ?
+========================================
+
+ADBC is an *API abstraction*.  It doesn't specify what goes on between
+your client and the database, just the API calls that you make as an
+application developer.  Under the hood, a driver must take those API
+calls and talk to the actual database.  Another perspective is that
+ADBC is all about the client-side, and specifies nothing about the
+network protocol or server-side implementation.
+
+Flight SQL is a *wire protocol*.  It specifies the exact commands to
+send to a database to perform various actions like authenticating with
+the database, creating prepared statements, or executing queries.
+Flight SQL specifies the network protocol that the client and the
+server must implement.
+
+One more way of looking at it: an ADBC driver can be written for a
+database purely as a client library.  (That's how the PostgreSQL
+driver in this repository is implemented, for instance—as a wrapper
+around libpq.)  But adding Flight SQL support to a database means
+either modifying the database to run a Flight SQL service, or putting
+the database behind a proxy that translates between Flight SQL and the
+database.
+
+Why not just use Arrow Flight SQL?
+==================================
+
+Continuing on from the previous question:

Review Comment:
   I don't think the `Continuing on from the previous question:` lines are 
necessary, but that's just my opinion.



##########
README.md:
##########
@@ -22,32 +22,41 @@
 
[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/apache/arrow-adbc/blob/master/LICENSE.txt)
 
 ADBC is an API standard (version 1.0.0) for database access libraries 
("drivers") in C, Go, and Java that uses Arrow for result sets and query 
parameters.
-Instead of writing code for each individual database, applications can build 
against the ADBC APIs, and link against drivers that implement the standard.
+Instead of writing code to extract Arrow data out of each individual database, 
applications can build against the ADBC APIs, and link against drivers that 
implement the standard.
 Additionally, a JDBC/ODBC-style driver manager is provided. This also 
implements the ADBC APIs, but dynamically loads drivers and dispatches calls to 
them.
 
-Like JDBC/ODBC, the goal is to provide a generic API for multiple databases, 
but ADBC is focused on Arrow-based data access for analytics use cases (bulk 
data retrieval/ingestion), and not the full spectrum of use cases that 
JDBC/ODBC drivers handle.
+Like JDBC/ODBC, the goal is to provide a generic API for multiple databases.
+ADBC, however, focuses on Arrow-based data access for analytics use cases - 
primarily bulk data retrieval and ingestion, and not the full spectrum of use 
cases that JDBC/ODBC support.
 Hence, ADBC is complementary to those existing standards.
 
-Like [Arrow Flight SQL][flight-sql], ADBC is an Arrow-based database access 
API.
-But Flight SQL also specifies the wire format and network transport (Flight 
RPC), while ADBC lets drivers make their own decisions.
+Like [Arrow Flight SQL][flight-sql], ADBC is an Arrow-based way to work with 
databases.
+But Flight SQL is a database access _protocol_ that also defines the wire 
format and network transport, so it's only useful if a database specifically 
implements support for it.
+ADBC lets drivers wrap existing database protocols, whether Arrow-native or 
not.

Review Comment:
   ```suggestion
   However, Flight SQL is a _protocol_ defining a wire format and network 
transport as opposed to an _API specification_. Flight SQL requires a database 
to specifically implement support for it, while ADBC is a client API 
specification for wrapping existing database protocols which could be 
Arrow-native or not.
   ```
   



##########
docs/source/faq.rst:
##########
@@ -0,0 +1,148 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+================================
+Frequently Asked Questions (FAQ)
+================================
+
+What exactly is ADBC?
+=====================
+
+ADBC is:
+
+- A set of abstract APIs in different languages (C/C++, Go, and Java)
+  for working with databases and Arrow data.
+
+  For example, result sets of queries in ADBC are all returned as
+  streams of Arrow data, not row-by-row.
+- A set of implementations of that API in different languages (C/C++,
+  Go, Java, Python, and Ruby) that target different databases
+  (e.g. PostgreSQL, SQLite, any database supporting Flight SQL).
+
+Why not just use JDBC/ODBC?
+===========================
+
+JDBC uses row-based interfaces like `ResultSet`_.  When working with
+columnar data, like Arrow data, this means that we have to convert the
+data at least once and possibly twice:
+
+- Once (possibly) in the driver or database, to take columnar data and
+  convert it into a row-based format so it can be returned through the
+  JDBC APIs.
+- Once (always) when a client application pulls data from the JDBC
+  API, to convert the rows into columns.
+
+In keeping with Arrow's "zero-copy" or "minimal-copy" ethos, we would
+like to avoid these unnecessary conversions.
+
+ODBC is in a similar situation.  Although ODBC does support
+`"column-wise binding"`_, not all ODBC drivers support it, and it is
+more complex to use.  Additionally, ODBC uses caller-allocated buffers
+(which often means forcing a data copy), and ODBC specifies data
+layouts that are not quite Arrow-compatible (requiring a data
+conversion anyways).
+
+.. _ResultSet: 
https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html
+.. _"column-wise binding": 
https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/column-wise-binding?view=sql-server-ver16
+
+How do ADBC and Arrow Flight SQL differ?
+========================================
+
+ADBC is an *API abstraction*.  It doesn't specify what goes on between
+your client and the database, just the API calls that you make as an
+application developer.  Under the hood, a driver must take those API
+calls and talk to the actual database.  Another perspective is that
+ADBC is all about the client-side, and specifies nothing about the
+network protocol or server-side implementation.
+
+Flight SQL is a *wire protocol*.  It specifies the exact commands to
+send to a database to perform various actions like authenticating with
+the database, creating prepared statements, or executing queries.
+Flight SQL specifies the network protocol that the client and the
+server must implement.
+
+One more way of looking at it: an ADBC driver can be written for a
+database purely as a client library.  (That's how the PostgreSQL
+driver in this repository is implemented, for instance—as a wrapper
+around libpq.)  But adding Flight SQL support to a database means
+either modifying the database to run a Flight SQL service, or putting
+the database behind a proxy that translates between Flight SQL and the
+database.
+
+Why not just use Arrow Flight SQL?
+==================================
+
+Continuing on from the previous question:
+
+Because ADBC is client-side, ADBC can support databases that either
+don't support returning Arrow data, or support Arrow data through a
+protocol besides Flight SQL.
+
+Then do we even need Arrow Flight SQL?
+======================================
+
+Continuing on from the previous two questions:
+
+Flight SQL is a concrete protocol that database vendors can implement,
+instead of designing their own protocol.  And Flight SQL also has JDBC
+and ODBC drivers for maximal compatibility.
+
+As an analogy: many databases implement the PostgreSQL wire protocol,
+so that they can gain access to all the PostgreSQL clients, including
+JDBC and ODBC drivers.  (And JDBC/ODBC users can still use other
+drivers to work with other databases.)
+
+For the Arrow ecosystem, we hope databases will implement the Flight
+SQL wire protocol, giving them access to all the Flight SQL clients,
+including ADBC, JDBC, and ODBC drivers.  (And ADBC users can still use
+other drivers to work with other databases.)
+
+So what is the "ADBC Flight SQL driver" then?
+=============================================
+
+The ADBC Flight SQL driver implements the ADBC API standard (which an
+application interacts with) using the Flight SQL wire protocol (which
+a database server exposes).  So it's a generic driver that can talk to
+many databases, and it also implements a generic API (ADBC) designed
+to abstract over multiple databases.
+
+This is a little unusual, in that most database drivers and database
+protocols you'll find were meant for a specific database.  But Flight
+SQL was designed to be agnostic to the database from the start, and so
+was ADBC.  It sounds like they overlap, but they complement each other
+because they operate at different levels of abstraction.  Flight SQL
+targets database server developers, and gives them one protocol they
+can implement in order to reach ADBC, JDBC, and ODBC users.  ADBC
+targets database users, and gives them one Arrow-native API they can
+use to work with both Arrow-native and non-Arrow-native databases.

Review Comment:
   It might be worthwhile to explicitly state: Database server developers can 
provide a Flight SQL interface and they won't have to build and distribute 
their own drivers for any of ADBC, JDBC, or ODBC. Consumers would be able to 
leverage the existing Flight SQL drivers for all three instead. Thereby 
reducing the maintenance burden on database developers.
   
   At least in some form or wording, but I think it's worthwhile to be explicit 
about that rather than letting people draw that conclusion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-adbc] zeroshade commented on a diff in pull request #504: docs: update README and add FAQ

Reply via email to