lidavidm commented on code in PR #248:
URL: https://github.com/apache/arrow-site/pull/248#discussion_r1055647370
##########
_posts/2022-12-31-arrow-adbc.md:
##########
@@ -0,0 +1,217 @@
+---
+layout: post
+title: "Introducing ADBC: Database Access for Apache Arrow"
+date: "2022-12-31 00:00:00"
+author: pmc
+categories: [application]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Arrow community would like to introduce version 1.0.0 of the [Arrow
Database Connectivity (ADBC)][adbc] specification.
+ADBC is a columnar, minimal-overhead alternative to JDBC/ODBC for analytical
applications.
+Or in other words: **ADBC is a single API for getting Arrow data in and out of
different databases**.
+
+## Motivation
+
+Applications often use API standards like [JDBC][jdbc] and [ODBC][odbc] to
work with databases.
+That way, they can code to the same API regardless of the underlying database,
saving on development time.
+Roughly speaking, when an application executes a query with these APIs:
+
+1. The application submits a SQL query via the JDBC/ODBC API.
+2. The query is passed on to the driver.
+3. The driver translates the query to a database-specific protocol and sends
it to the database.
+4. The database executes the query and returns the result set in a
database-specific format.
+5. The driver translates the result format into the JDBC/ODBC API.
+6. The application iterates over the result rows using the JDBC/ODBC API.
+
+<figure style="text-align: center;">
+ <img src="{{ site.baseurl }}/img/ADBCFlow1.svg" width="90%"
class="img-responsive" alt="A diagram showing the query execution flow.">
+ <figcaption>The query execution flow.</figcaption>
+</figure>
+
+When columnar data comes into play, however, problems arise.
+JDBC is a row-oriented API, and while ODBC can support columnar data, the type
system and data representation is not a perfect match with Arrow.
+So generally, columnar data must be converted to rows between steps 5 and 6,
spending resources without performing "useful" work.
+
+This mismatch is problematic for columnar database systems, such as
ClickHouse, Dremio, DuckDB, and Google BigQuery.
+On the client side, tools such as Apache Spark and pandas would be better off
getting columnar data directly, skipping that conversion.
+Otherwise, they're leaving performance on the table.
+At the same time, that conversion isn't always avoidable.
+Row-oriented database systems like PostgreSQL aren't going away, and these
clients will still want to consume data from them.
+
+Developers have a few options:
+
+- *Just use JDBC/ODBC*.
+ These standards are here to stay, and it makes sense for databases to
support them for applications that want them.
+ But when both the database and the application are columnar, that means
converting data into rows for JDBC/ODBC, only for the client to convert them
right back into columns!
+ Performance suffers, and developers have to spend time implementing the
conversions.
+- *Use JDBC/ODBC to Arrow conversion libraries*.
Review Comment:
thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]