Re: [VOTE] Adopt ADBC database client connectivity specification

Gavin Ray Thu, 22 Sep 2022 10:47:02 -0700

I suppose you're thinking from a memory/performance perspective right?
Allocating a dot character is a lot better than allocating multiple arrays


Yeah I don't see why not -- this could even be a library internal where the
fact that it's dotted is an implementation detail
Then in the Java implementation or whatnot, you can call
".getFullyQualifiedTableName()" which will do the allocating parse to a
List<String> for you, or whatnot

The array was mostly for convenience's sake (our API is JSON and not
particularly performance-oriented)

On Thu, Sep 22, 2022 at 1:40 PM David Li <[email protected]> wrote:

> Ah, interesting…
>
> A self-recursive schema wouldn't work in Arrow's schema system, so it'd
> have to be the latter solution. Or, would it work to have a dotted name in
> the schema name column? Would parsing that back out (for applications that
> want to work with the full hierarchy) be too much trouble?
>
> On Thu, Sep 22, 2022, at 13:14, Gavin Ray wrote:
> > Antoine, I can't comment on the Go code (not qualified) but to me, the
> > "verification" test
> > examples look like a mixture between JDBC and Java FlightSQL driver
> usage,
> > and seem solid.
> >
> > There was one reservation I had about the ability to handle datasource
> > namespacing that I brought up early on in the proposal discussions
> > (David responded to it but I got busy and forgot to reply again)
> >
> > If you have a datasource which provides possibly arbitrary levels of
> schema
> > namespace (something like Apache Calcite, for example)
> > How do you represent the table/schema names?
> >
> > Suppose I have a service with a DB layout like this:
> >
> > / foo
> >     / bar
> >         / baz
> >             /qux
> >               / table1
> >                 - column1
> >
> > At my dayjob, we have a technology which is very similar to
> > ADBC/FlightSQL
> > (would be great to adopt Substrait + ADBC once they're mature enough)
> > -
> >
> https://github.com/hasura/graphql-engine/blob/master/dc-agents/README.md#data-connectors
> > -
> >
> https://techcrunch.com/2022/06/28/hasura-now-lets-developers-turn-any-data-source-into-a-graphql-api/
> >
> > We wound up having to redesign the specification to handle datasources
> that
> > don't fit the "database-schema-table" or "database-table" mould
> >
> > In the ADBC schema for schema metadata, it looks like it expects a
> > single
> > "schema" struct:
> >
> https://github.com/apache/arrow-adbc/blob/7866a566f5b7b635267bfb7a87ea49b01dfe89fa/java/core/src/main/java/org/apache/arrow/adbc/core/StandardSchemas.java#L132-L152
> >
> > If you want to be flexible, IMO it would be good to either:
> >
> > 1. Have DB_SCHEMA_SCHEMA be self-recursive, so that schemas (with or
> > without tables) can be nested arbitrarily deep underneath each other
> >       - Fully-Qualified-Table-Name (FQTN) can then be computed by walking
> > up from a table and concating the schema name until the root schema is
> > reached
> >
> > 2. Make "catalog" and "schema" go away entirely, and tables just have a
> > FQTN that is an array, a database is a collection of tables
> >      - You can compute what would have been the catalog + schema
> hierarchy
> > by doing a .reduce() over the list of tables and
> >
> > Or maybe there is another, better way. But that's my $0.02 and the only
> > real concern about the API I have, without actually trying to build
> > something with it.
> >
> >
> >
> >
> >
> > On Thu, Sep 22, 2022 at 5:40 AM Antoine Pitrou <[email protected]>
> wrote:
> >
> >>
> >> Hello,
> >>
> >> I would urge people to review the proposed ADBC APIs, especially the Go
> >> and Java APIs which probably benefitted from less feedback than the C
> one.
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 21/09/2022 à 17:40, David Li a écrit :
> >> > Hello,
> >> >
> >> > We have been discussing [1] standard interfaces for Arrow-based
> database
> >> access and have been working on implementations of the proposed
> interfaces
> >> [2], all under the name "ADBC". This proposal aims to provide a unified
> >> client abstraction across Arrow-native database protocols (like Flight
> SQL)
> >> and non-Arrow database protocols, which can then be used by Arrow
> projects
> >> like Dataset/Acero and ecosystem projects like Ibis.
> >> >
> >> > For details, see the RFC here:
> >> https://github.com/apache/arrow/pull/14079
> >> >
> >> > I would like to propose that the Arrow project adopt this RFC, along
> >> with apache/arrow-adbc commit 7866a56 [3], as version 1.0.0 of the ADBC
> API
> >> standard.
> >> >
> >> > Please vote to adopt the specification as described above. (This is
> not
> >> a vote to release any components.)
> >> >
> >> > This vote will be open for at least 72 hours.
> >> >
> >> > [ ] +1 Adopt the ADBC specification
> >> > [ ]  0
> >> > [ ] -1 Do not adopt the specification because...
> >> >
> >> > Thanks to the DuckDB and R DBI projects for providing feedback on and
> >> implementations of the proposal.
> >> >
> >> > [1]: https://lists.apache.org/thread/cq7t9s5p7dw4vschylhwsfgqwkr5fmf2
> >> > [2]: https://github.com/apache/arrow-adbc
> >> > [3]:
> >>
> https://github.com/apache/arrow-adbc/commit/7866a566f5b7b635267bfb7a87ea49b01dfe89fa
> >> >
> >> > Thank you,
> >> > David
> >>
>

Re: [VOTE] Adopt ADBC database client connectivity specification

Reply via email to