hi Neville,

In Python we have something called the DB API 2.0 (PEP 249) that
defines an API standard for SQL databases in Python, including an
expectation around the data format of result sets. It sounds like you
need to create the equivalent of that in Rust with Arrow as the API /
format returned by fetch/fetchall operations.

Once you define a standard API for SQL database interactions, then you
can start creating multiple implementations of that API that can be
passed interchangeably into applications. Applications of course are
responsible for knowing which version of SQL is supported by a driver,
but this layer is agnostic to the SQL strings that get passed in for
queries. In Python it's a little easier to manage because of duck
typing (an implementation of the DB API 2.0 does not need to depend on
any libraries) and there's a standard test suite you can use to verify
compliance with the API.

FWIW, I would like to do the same thing for Arrow C++, to create a
DBAPI 2.0-like API that can be implemented by database driver
interfaces maintained by the community as well as the third party
projects. It might make sense for us to create a version of this API
that can be used from C/CFFI with the Arrow C data interface. I
suspect we'll get to this eventually -- PEP 249 came 10 years into the
Python programming language, and I often liken Arrow's growth pattern
to that of a programming language.

- Wes

On Sat, Sep 26, 2020 at 9:22 PM Neville Dipale <nevilled...@gmail.com> wrote:
>
> Hi Arrow developers
>
> I would like to gauge the appetite for an Arrow SQL connector that:
>
> * Reads and writes Arrow data to and from SQL databases
> * Reads tables and queries into record batches, and writes batches to
> tables (either append or overwrite)
> * Leverages binary SQL formats where available (e.g. PostgreSQL format is
> relatively easy and well-documented)
> * Provides a batch interface that abstracts away the different database
> semantics, and exposes a RecordBatchReader (
> https://docs.rs/arrow/1.0.1/arrow/record_batch/trait.RecordBatchReader.html),
> and perhaps a RecordBatchWriter
> * Resides in the Rust repo as either an arrow::sql module (like arrow::csv,
> arrow::json, arrow::ipc) or alternatively is a separate crate in the
> workspace  (*arrow-sql*?)
>
> I would be able to contribute a Postgres reader/writer as a start.
> I could make this a separate crate, but to drive adoption I would prefer
> this living in Arrow, also it can remain updated (sometimes we reorganise
> modules and end up breaking dependencies).
>
> Also, being developed next to DataFusion could allow DF to support SQL
> databases, as this would be yet another datasource.
>
> Some questions:
> * Should such library support async, sync or both IO methods?
> * Other than postgres, what other databases would be interesting? Here I'm
> hoping that once we've established a suitable API, it could be easier to
> natively support more database types.
>
> Potential concerns:
>
> * Sparse database support
> It's a lot of effort to write database connectors, especially if starting
> from scratch (unlike with say JDBC). What if we end up supporting 1 or 2
> database servers?
> Perhaps in that case we could keep the module without publishing it to
> crates.io until we're happy with database support, or even its usage.
>
> * Dependency bloat
> We could feature-gate database types to reduce the number of dependencies
> if one only wants certain DB connectors
>
> * Why not use Java's JDBC adapter?
> I already do this, but sometimes if working on a Rust project, creating a
> separate JVM service solely to extract Arrow data is a lot of effort.
> I also don't think it's currently possible to use the adapter to save Arrow
> data in a database.
>
> * What about Flight SQL extensions?
> There have been discussions around creating Flight SQL extensions, and the
> Rust SQL adapter could implement that and co-exist well.
> From a crate dependency, *arrow-flight* depends on *arrow*, so it could
> also depend on this *arrow-sql* crate.
>
> Please let me know what you think
>
> Regards
> Neville

Reply via email to