Got it, thank you David! I started prototyping the implementation last night, hopefully I will make some good progress and have something basic functioning soon.
RE: The metadata thing -- I think both Calcite and Teiid have solid interfaces for defining what capabilities a datasource has. https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528 It's probably not possible to make something universal, but it seems like you could get pretty close to most common functionality/capabilities On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <ky...@bitquilltech.com.invalid> wrote: > Yes, we should, where possible, avoid any one of metadata. This is where > other standards fail in that applications must be custom built for each > data source, if we standardize the metadata then applications can at least > be built to adapt. > > On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <lidav...@apache.org> wrote: > > > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's use, so > > the application can use others for its own purposes. That said if they > seem > > commonly applicable maybe we should try to standardize them. > > > > I think what you are doing should be reasonable. You may not need _all_ > of > > the capabilities in Flight SQL for this (e.g. all the various metadata > > calls, or prepared statements, perhaps) but I don't see why it wouldn't > > work for you. > > > > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote: > > > To touch on the question about supported features -- is it possible to > > > advertise arbitrary/custom "capabilites" in GetSqlInfo? > > > Say that you want to represent some set of behaviors that FlightSQL > > > services can support. > > > > > > Stuff like "Supports grouping by multiple distinct aggregates", > "Supports > > > self-joins on aliased tables" etc > > > This is going to be unique to each implementation, but I couldn't > > determine > > > whether there was a way to express arbitrary capabilities > > > > > > Also, in case it's helpful I put together an ASCII diagram of what I'm > > > trying to do with FlightSQL > > > If anyone has a moment, would appreciate input on whether it's > feasible/a > > > good idea > > > > > > https://pastebin.com/raw/VF2r0F3f > > > > > > Thank you =) > > > > > > > > > On Fri, Mar 4, 2022 at 2:37 PM David Li <lidav...@apache.org> wrote: > > > > > >> We could also add say CommandSubstraitQuery as a distinct message, and > > >> older servers would just reject it as an unknown request type. > > >> > > >> -David > > >> > > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote: > > >> >> > > >> >> 1. How does a server report that it supports each command type? > > Initial > > >> >> thought is a property in GetSqlInfo. > > >> > > > >> > > > >> > This sounds reasonable. > > >> > > > >> > > > >> >> What happens to client code written prior to changing the command > > type > > >> >> to be a oneOf field? Same for servers. > > >> > > > >> > > > >> > It is transparent from older clients (I'm 99% sure the wire protocol > > >> > doesn't change). Servers is a little harder. The one saving grace > > is I > > >> > don't think an empty/not-present SQL string would be something most > > >> servers > > >> > could handle, so they would probably error with something that while > > >> > not-obvious would give a clue to the clients (but hopefully this > would > > >> be a > > >> > non-issue because the capabilities would be checked for clients > > wishing > > >> to > > >> > to use this feature first). > > >> > > > >> > -Micah > > >> > > > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <jam...@bitquilltech.com > > >> .invalid> > > >> > wrote: > > >> > > > >> >> It sounds like an interesting and useful project to use Subtstrait > > as an > > >> >> alternative to SQL strings. > > >> >> > > >> >> Important aspects to spec out are: > > >> >> 1. How does a server report that it supports each command type? > > Initial > > >> >> thought is a property in GetSqlInfo. > > >> >> 2. What happens to client code written prior to changing the > command > > >> type > > >> >> to be a oneOf field? Same for servers. > > >> >> More generally, how should backward compatibility work, and what > > should > > >> >> happen if a client sends an unsupported > > >> >> command type to a server. > > >> >> 3. Should inputs to catalog RPC calls also accept Substrait > > structures? > > >> >> > > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <ray.gavi...@gmail.com> > > >> wrote: > > >> >> > > >> >> > @James Duong <jam...@bitquilltech.com> > > >> >> > > > >> >> > You are absolutely right, I realized this and confirmed whether > > this > > >> >> > would be possible with Jacques to double-check. > > >> >> > It would amount to what I might call "dollar-store Substrait." > It's > > >> not > > >> >> > elegant or a good solution, but definitely presents a good > > duct-tape > > >> hack > > >> >> > and is a crafty idea. > > >> >> > > > >> >> > I agree with Jacques -- when you think about FlightSQL, what you > > are > > >> >> > attempting with a query isn't necessarily SQL, but a general > > >> data-compute > > >> >> > operation. > > >> >> > SQL just so happens to be a fairly universal way to express them, > > >> with an > > >> >> > ANSI standard, but FlightSQL doesn't recognize any particular > > subset > > >> of > > >> >> it > > >> >> > and for all intents and purposes it doesn't matter what the > > operation > > >> >> > string contains. > > >> >> > > > >> >> > Substrait would make a fantastic logical next-feature because > it's > > >> >> > targeted as a specification for expressing relational algebra and > > >> >> > data-compute operations > > >> >> > This more-or-less equates to SQL strings (in my mind at least) > > with a > > >> >> much > > >> >> > better toolkit and Dev UX. If there is anything I can do to help > > move > > >> >> this > > >> >> > forward, please let me know because I am extremely motivated to > do > > so. > > >> >> > > > >> >> > @David Li <git...@lidavidm.me> > > >> >> > > > >> >> > Also agreed. Substrait is put together by folks much smarter than > > >> myself, > > >> >> > and if I had to hedge my bets, I'd put money on it being the > > future of > > >> >> > data-compute interop. > > >> >> > I would love nothing more than to adopt this technology and push > it > > >> >> along. > > >> >> > > > >> >> > Your project does sound interesting - basically, it sounds like a > > >> tabular > > >> >> >> data storage service with query pushdown? > > >> >> >> > > >> >> > > > >> >> > Yeah this is more or less the details of it (my personal email, > > with > > >> >> > discretion assumed, is always open) > > >> >> > > > >> >> > Imagine an environment where a backend wants to advertise some > > kind of > > >> >> > schema/data catalog > > >> >> > > > >> >> > And then a central service introspects these backends, and > > dynamically > > >> >> > generates an API from the data catalogues/schemas, where requests > > get > > >> >> > proxied to the underlying backend service for each schema to > > actually > > >> be > > >> >> > executed > > >> >> > > > >> >> > In text, the flow would look something like: > > >> >> > > > >> >> > > > >> >> > <----> Data Provider Backend 0 > > >> >> > Client <-----> Central Service <---> Generated API <----> > > >> Data-Provider > > >> >> > Backend 1 > > >> >> > > > >> >> > <----> Data Provider Backend 2 > > >> >> > > > >> >> > > > >> >> > > > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <lidav...@apache.org> > > wrote: > > >> >> > > > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an > > >> alternative to > > >> >> >> Substrait, at least one that isn't even more nascent or one > that's > > >> very > > >> >> >> tied to a particular language, so perhaps it might be better to > > get > > >> >> >> involved in Substrait and see if it suits your needs? > Convincing a > > >> team > > >> >> to > > >> >> >> try something new can be hard, though, and it is somewhat of a > > moving > > >> >> >> target - but Flight SQL is in a similar spot, I think, as it's > > still > > >> >> >> getting enhancements. > > >> >> >> > > >> >> >> Your project does sound interesting - basically, it sounds like > a > > >> >> tabular > > >> >> >> data storage service with query pushdown? > > >> >> >> > > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote: > > >> >> >> > James, I agree that you could use JSON but that feels a bit > > hacky > > >> >> >> > (mis-use > > >> >> >> > of the paradigm). Instead, I'd really like to do something > like > > >> David > > >> >> is > > >> >> >> > suggesting: support Substrait as an alternative to a SQL > string. > > >> >> >> > Something like this: > > >> >> >> > > > >> >> >> > > >> >> > > >> > > > https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc > > >> >> >> > > > >> >> >> > It would be great if someone wanted to pick this up. It would > > be a > > >> >> nice > > >> >> >> > enhancement to FlightSQL (and provide a structured way to > > express > > >> >> >> > operations). > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong < > > >> jam...@bitquilltech.com > > >> >> >> .invalid> > > >> >> >> > wrote: > > >> >> >> > > > >> >> >> >> In the same way that you could write an ODBC driver that > takes > > in > > >> >> text > > >> >> >> >> that's not SQL, you could write a Flight SQL server that > takes > > in > > >> >> text > > >> >> >> >> that's JSON. > > >> >> >> >> Flight SQL doesn't parse the query, so you could create > > commands > > >> that > > >> >> >> are > > >> >> >> >> just JSON text. > > >> >> >> >> > > >> >> >> >> Is that the only bit you need, Gavin? > > >> >> >> >> > > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray < > > ray.gavi...@gmail.com> > > >> >> >> wrote: > > >> >> >> >> > > >> >> >> >> > I am enthusiastic about Substrait and have followed it's > > >> progress > > >> >> >> eagerly > > >> >> >> >> > =D > > >> >> >> >> > > > >> >> >> >> > When I presented it as a tentative option, there were > > >> reservations > > >> >> >> >> because > > >> >> >> >> > of the project/spec being young and the functionality still > > >> being > > >> >> >> >> > fleshed out. > > >> >> >> >> > I think if I were having this conversation in say, 8-16 > > months, > > >> it > > >> >> >> would > > >> >> >> >> > have been an easy choice, no doubt. > > >> >> >> >> > > > >> >> >> >> > On a public mailing list (and I can share more details in > > >> private > > >> >> if > > >> >> >> >> you're > > >> >> >> >> > curious), the gist of it is this: > > >> >> >> >> > > > >> >> >> >> > Some well-defined/backed-by-mature tech solution for > > expressing > > >> >> data > > >> >> >> >> > compute operations between services would be a useful thing > > to > > >> have > > >> >> >> >> > (Especially if it's language-agnostic) > > >> >> >> >> > > > >> >> >> >> > The goal is for an "implementing service" to have: > > >> >> >> >> > - An introspectable schema (IE, "describe yourself to me") > > >> >> >> >> > - A query/operation execution endpoint (IE: "perform this > > >> operation > > >> >> >> on > > >> >> >> >> your > > >> >> >> >> > data") > > >> >> >> >> > > > >> >> >> >> > With FlightSQL this is possible I believe, but it requires > > the > > >> >> >> operation > > >> >> >> >> to > > >> >> >> >> > be expressed as a SQL string which isn't ideal. > > >> >> >> >> > > > >> >> >> >> > Working with some programmatic, structured object that has > > the > > >> same > > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query would > > >> have, > > >> >> >> would > > >> >> >> >> be > > >> >> >> >> > a better experience > > >> >> >> >> > (Jacques is on to something here!) > > >> >> >> >> > > > >> >> >> >> > This interface between services would be somewhat the > > >> equivalent of > > >> >> >> an > > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed library > > for > > >> >> >> >> expressing > > >> >> >> >> > and building-up query/data-compute ops. > > >> >> >> >> > > > >> >> >> >> > > > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li < > lidav...@apache.org > > > > > >> >> wrote: > > >> >> >> >> > > > >> >> >> >> > > You probably want Substrait: https://substrait.io/ > > >> >> >> >> > > > > >> >> >> >> > > Which is being worked on by several people, including > Arrow > > >> >> >> community > > >> >> >> >> > > members. > > >> >> >> >> > > > > >> >> >> >> > > It might be interesting to generalize Flight SQL to > include > > >> >> >> support for > > >> >> >> >> > > Substrait. I'm curious what your application, if you're > > able > > >> to > > >> >> >> share > > >> >> >> >> > more. > > >> >> >> >> > > > > >> >> >> >> > > -David > > >> >> >> >> > > > > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote: > > >> >> >> >> > > > Hiya, > > >> >> >> >> > > > > > >> >> >> >> > > > I am drafting a proposal for a way to enable services > to > > >> >> express > > >> >> >> data > > >> >> >> >> > > > compute operations to each other. > > >> >> >> >> > > > > > >> >> >> >> > > > However I think it'll be difficult to get buy-in if the > > only > > >> >> >> >> > > representation > > >> >> >> >> > > > for queries is as SQL strings. > > >> >> >> >> > > > > > >> >> >> >> > > > Is there any kind of lower-level API that can be used > to > > >> >> express > > >> >> >> >> > > operations? > > >> >> >> >> > > > > > >> >> >> >> > > > IE instead of "SELECT name FROM user" > > >> >> >> >> > > > > > >> >> >> >> > > > A structured representation like: > > >> >> >> >> > > > { > > >> >> >> >> > > > "op": "query", > > >> >> >> >> > > > "schema": "user", > > >> >> >> >> > > > "project": ["name"] > > >> >> >> >> > > > } > > >> >> >> >> > > > > > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense? > > >> >> >> >> > > > > > >> >> >> >> > > > Thank you =) > > >> >> >> >> > > > > >> >> >> >> > > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> -- > > >> >> >> >> > > >> >> >> >> *James Duong* > > >> >> >> >> Lead Software Developer > > >> >> >> >> Bit Quill Technologies Inc. > > >> >> >> >> Direct: +1.604.562.6082 | jam...@bitquilltech.com > > >> >> >> >> https://www.bitquilltech.com > > >> >> >> >> > > >> >> >> >> This email message is for the sole use of the intended > > >> recipient(s) > > >> >> >> and may > > >> >> >> >> contain confidential and privileged information. Any > > unauthorized > > >> >> >> review, > > >> >> >> >> use, disclosure, or distribution is prohibited. If you are > not > > >> the > > >> >> >> >> intended recipient, please contact the sender by reply email > > and > > >> >> >> destroy > > >> >> >> >> all copies of the original message. Thank you. > > >> >> >> >> > > >> >> >> > > >> >> > > > >> >> > > >> >> -- > > >> >> > > >> >> *James Duong* > > >> >> Lead Software Developer > > >> >> Bit Quill Technologies Inc. > > >> >> Direct: +1.604.562.6082 | jam...@bitquilltech.com > > >> >> https://www.bitquilltech.com > > >> >> > > >> >> This email message is for the sole use of the intended recipient(s) > > and > > >> may > > >> >> contain confidential and privileged information. Any unauthorized > > >> review, > > >> >> use, disclosure, or distribution is prohibited. If you are not the > > >> >> intended recipient, please contact the sender by reply email and > > destroy > > >> >> all copies of the original message. Thank you. > > >> >> > > >> > > >