Ah yeah that's true, good point
On Thu, Sep 22, 2022 at 2:38 PM David Li <lidav...@apache.org> wrote: > I suppose the separator would have to be known to the client somehow > (perhaps as metadata) - you'd have the same problem in the opposite > direction if the result were a list right? You wouldn't be able to > concatenate the parts together without knowing a safe separator to use. > > On Thu, Sep 22, 2022, at 14:23, Gavin Ray wrote: > > Wait, what happens if a datasource's spec allows dots as valid > identifiers? > > > > On Thu, Sep 22, 2022 at 2:22 PM Gavin Ray <ray.gavi...@gmail.com> wrote: > > > >> Ah okay, yeah that's a reasonable angle too haha > >> > >> > >> On Thu, Sep 22, 2022 at 1:59 PM David Li <lidav...@apache.org> wrote: > >> > >>> Frankly it was from a "not drastically refactoring things" perspective > :) > >>> > >>> At least for Arrow: list[utf8] is effectively a utf8 array with an > extra > >>> array of offsets, so there's relatively little overhead. (In > particular, > >>> there's not an extra allocation per array; there's just an overall > >>> allocation of a bitmap/offsets buffer.) > >>> > >>> On Thu, Sep 22, 2022, at 13:46, Gavin Ray wrote: > >>> > I suppose you're thinking from a memory/performance perspective > right? > >>> > Allocating a dot character is a lot better than allocating multiple > >>> arrays > >>> > > >>> > Yeah I don't see why not -- this could even be a library internal > where > >>> the > >>> > fact that it's dotted is an implementation detail > >>> > Then in the Java implementation or whatnot, you can call > >>> > ".getFullyQualifiedTableName()" which will do the allocating parse > to a > >>> > List<String> for you, or whatnot > >>> > > >>> > The array was mostly for convenience's sake (our API is JSON and not > >>> > particularly performance-oriented) > >>> > > >>> > On Thu, Sep 22, 2022 at 1:40 PM David Li <lidav...@apache.org> > wrote: > >>> > > >>> >> Ah, interesting… > >>> >> > >>> >> A self-recursive schema wouldn't work in Arrow's schema system, so > it'd > >>> >> have to be the latter solution. Or, would it work to have a dotted > >>> name in > >>> >> the schema name column? Would parsing that back out (for > applications > >>> that > >>> >> want to work with the full hierarchy) be too much trouble? > >>> >> > >>> >> On Thu, Sep 22, 2022, at 13:14, Gavin Ray wrote: > >>> >> > Antoine, I can't comment on the Go code (not qualified) but to me, > >>> the > >>> >> > "verification" test > >>> >> > examples look like a mixture between JDBC and Java FlightSQL > driver > >>> >> usage, > >>> >> > and seem solid. > >>> >> > > >>> >> > There was one reservation I had about the ability to handle > >>> datasource > >>> >> > namespacing that I brought up early on in the proposal discussions > >>> >> > (David responded to it but I got busy and forgot to reply again) > >>> >> > > >>> >> > If you have a datasource which provides possibly arbitrary levels > of > >>> >> schema > >>> >> > namespace (something like Apache Calcite, for example) > >>> >> > How do you represent the table/schema names? > >>> >> > > >>> >> > Suppose I have a service with a DB layout like this: > >>> >> > > >>> >> > / foo > >>> >> > / bar > >>> >> > / baz > >>> >> > /qux > >>> >> > / table1 > >>> >> > - column1 > >>> >> > > >>> >> > At my dayjob, we have a technology which is very similar to > >>> >> > ADBC/FlightSQL > >>> >> > (would be great to adopt Substrait + ADBC once they're mature > enough) > >>> >> > - > >>> >> > > >>> >> > >>> > https://github.com/hasura/graphql-engine/blob/master/dc-agents/README.md#data-connectors > >>> >> > - > >>> >> > > >>> >> > >>> > https://techcrunch.com/2022/06/28/hasura-now-lets-developers-turn-any-data-source-into-a-graphql-api/ > >>> >> > > >>> >> > We wound up having to redesign the specification to handle > >>> datasources > >>> >> that > >>> >> > don't fit the "database-schema-table" or "database-table" mould > >>> >> > > >>> >> > In the ADBC schema for schema metadata, it looks like it expects a > >>> >> > single > >>> >> > "schema" struct: > >>> >> > > >>> >> > >>> > https://github.com/apache/arrow-adbc/blob/7866a566f5b7b635267bfb7a87ea49b01dfe89fa/java/core/src/main/java/org/apache/arrow/adbc/core/StandardSchemas.java#L132-L152 > >>> >> > > >>> >> > If you want to be flexible, IMO it would be good to either: > >>> >> > > >>> >> > 1. Have DB_SCHEMA_SCHEMA be self-recursive, so that schemas (with > or > >>> >> > without tables) can be nested arbitrarily deep underneath each > other > >>> >> > - Fully-Qualified-Table-Name (FQTN) can then be computed by > >>> walking > >>> >> > up from a table and concating the schema name until the root > schema > >>> is > >>> >> > reached > >>> >> > > >>> >> > 2. Make "catalog" and "schema" go away entirely, and tables just > >>> have a > >>> >> > FQTN that is an array, a database is a collection of tables > >>> >> > - You can compute what would have been the catalog + schema > >>> >> hierarchy > >>> >> > by doing a .reduce() over the list of tables and > >>> >> > > >>> >> > Or maybe there is another, better way. But that's my $0.02 and the > >>> only > >>> >> > real concern about the API I have, without actually trying to > build > >>> >> > something with it. > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > On Thu, Sep 22, 2022 at 5:40 AM Antoine Pitrou < > anto...@python.org> > >>> >> wrote: > >>> >> > > >>> >> >> > >>> >> >> Hello, > >>> >> >> > >>> >> >> I would urge people to review the proposed ADBC APIs, especially > >>> the Go > >>> >> >> and Java APIs which probably benefitted from less feedback than > the > >>> C > >>> >> one. > >>> >> >> > >>> >> >> Regards > >>> >> >> > >>> >> >> Antoine. > >>> >> >> > >>> >> >> > >>> >> >> Le 21/09/2022 à 17:40, David Li a écrit : > >>> >> >> > Hello, > >>> >> >> > > >>> >> >> > We have been discussing [1] standard interfaces for Arrow-based > >>> >> database > >>> >> >> access and have been working on implementations of the proposed > >>> >> interfaces > >>> >> >> [2], all under the name "ADBC". This proposal aims to provide a > >>> unified > >>> >> >> client abstraction across Arrow-native database protocols (like > >>> Flight > >>> >> SQL) > >>> >> >> and non-Arrow database protocols, which can then be used by Arrow > >>> >> projects > >>> >> >> like Dataset/Acero and ecosystem projects like Ibis. > >>> >> >> > > >>> >> >> > For details, see the RFC here: > >>> >> >> https://github.com/apache/arrow/pull/14079 > >>> >> >> > > >>> >> >> > I would like to propose that the Arrow project adopt this RFC, > >>> along > >>> >> >> with apache/arrow-adbc commit 7866a56 [3], as version 1.0.0 of > the > >>> ADBC > >>> >> API > >>> >> >> standard. > >>> >> >> > > >>> >> >> > Please vote to adopt the specification as described above. > (This > >>> is > >>> >> not > >>> >> >> a vote to release any components.) > >>> >> >> > > >>> >> >> > This vote will be open for at least 72 hours. > >>> >> >> > > >>> >> >> > [ ] +1 Adopt the ADBC specification > >>> >> >> > [ ] 0 > >>> >> >> > [ ] -1 Do not adopt the specification because... > >>> >> >> > > >>> >> >> > Thanks to the DuckDB and R DBI projects for providing feedback > on > >>> and > >>> >> >> implementations of the proposal. > >>> >> >> > > >>> >> >> > [1]: > >>> https://lists.apache.org/thread/cq7t9s5p7dw4vschylhwsfgqwkr5fmf2 > >>> >> >> > [2]: https://github.com/apache/arrow-adbc > >>> >> >> > [3]: > >>> >> >> > >>> >> > >>> > https://github.com/apache/arrow-adbc/commit/7866a566f5b7b635267bfb7a87ea49b01dfe89fa > >>> >> >> > > >>> >> >> > Thank you, > >>> >> >> > David > >>> >> >> > >>> >> > >>> > >> >