I quickly drafted these out (sans implementation so far): https://github.com/apache/arrow/pull/13492
On Thu, Jun 30, 2022, at 21:20, David Li wrote: > Ah - somehow I didn't think of that. Yes, we should just implement it > in the same way prepared statements are already implemented. > > On Thu, Jun 30, 2022, at 19:42, Micah Kornfield wrote: >>> >>> It would also then be good to make explicit the statefulness of >>> connections in Flight SQL. While that is sort of an obvious constraint, it >>> is at odds with how gRPC is usually used (especially in the presence of >>> load balancing). >> >> >> I'm not sure I understand where the statefulness requirements come in? >> Could you elaborate? It seems that a transaction could be an opaque ID on >> operations? >> >> On Thu, Jun 30, 2022 at 2:47 PM James Duong <jam...@bitquilltech.com.invalid> >> wrote: >> >>> This is a bit of a tangent from the original discussion about >>> Substrait integration. >>> >>> Flight SQL would definitely benefit from transaction RPC commands for >>> building bridge drivers. I'm also wondering if there should be an RPC call >>> to cancel a running query, as opposed to just having the client terminate >>> streams. This would allow a multi-process application to cancel work across >>> processes. >>> >>> On Thu, Jun 30, 2022 at 1:35 PM David Li <lidav...@apache.org> wrote: >>> >>> > Reviving this discussion: would people be interested in seeing a >>> > sketched-out CommandSubstraitQuery et. al.? >>> > >>> > Additionally, while working on ADBC, I realized: does Flight SQL need >>> > explicit Commit/Rollback commands? This would presumably be necessary if >>> we >>> > want to build ODBC/JDBC drivers on top, since those standards have >>> explicit >>> > commands, and Flight SQL doesn't have the luxury of a driver to issue >>> > database-specific SQL to implement these. >>> > >>> > It would also then be good to make explicit the statefulness of >>> > connections in Flight SQL. While that is sort of an obvious constraint, >>> it >>> > is at odds with how gRPC is usually used (especially in the presence of >>> > load balancing). >>> > >>> > On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote: >>> > > Got it, thank you David! >>> > > I started prototyping the implementation last night, hopefully I will >>> > make >>> > > some good progress and have something basic functioning soon. >>> > > >>> > > RE: The metadata thing -- I think both Calcite and Teiid have solid >>> > > interfaces for defining what capabilities a datasource has. >>> > > >>> > >>> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528 >>> > > >>> > > It's probably not possible to make something universal, but it seems >>> like >>> > > you could get pretty close to most common functionality/capabilities >>> > > >>> > > >>> > > On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <ky...@bitquilltech.com >>> > .invalid> >>> > > wrote: >>> > > >>> > >> Yes, we should, where possible, avoid any one of metadata. This is >>> where >>> > >> other standards fail in that applications must be custom built for >>> each >>> > >> data source, if we standardize the metadata then applications can at >>> > least >>> > >> be built to adapt. >>> > >> >>> > >> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <lidav...@apache.org> >>> wrote: >>> > >> >>> > >> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's >>> > use, so >>> > >> > the application can use others for its own purposes. That said if >>> they >>> > >> seem >>> > >> > commonly applicable maybe we should try to standardize them. >>> > >> > >>> > >> > I think what you are doing should be reasonable. You may not need >>> > _all_ >>> > >> of >>> > >> > the capabilities in Flight SQL for this (e.g. all the various >>> metadata >>> > >> > calls, or prepared statements, perhaps) but I don't see why it >>> > wouldn't >>> > >> > work for you. >>> > >> > >>> > >> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote: >>> > >> > > To touch on the question about supported features -- is it >>> possible >>> > to >>> > >> > > advertise arbitrary/custom "capabilites" in GetSqlInfo? >>> > >> > > Say that you want to represent some set of behaviors that >>> FlightSQL >>> > >> > > services can support. >>> > >> > > >>> > >> > > Stuff like "Supports grouping by multiple distinct aggregates", >>> > >> "Supports >>> > >> > > self-joins on aliased tables" etc >>> > >> > > This is going to be unique to each implementation, but I couldn't >>> > >> > determine >>> > >> > > whether there was a way to express arbitrary capabilities >>> > >> > > >>> > >> > > Also, in case it's helpful I put together an ASCII diagram of what >>> > I'm >>> > >> > > trying to do with FlightSQL >>> > >> > > If anyone has a moment, would appreciate input on whether it's >>> > >> feasible/a >>> > >> > > good idea >>> > >> > > >>> > >> > > https://pastebin.com/raw/VF2r0F3f >>> > >> > > >>> > >> > > Thank you =) >>> > >> > > >>> > >> > > >>> > >> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <lidav...@apache.org> >>> > wrote: >>> > >> > > >>> > >> > >> We could also add say CommandSubstraitQuery as a distinct >>> message, >>> > and >>> > >> > >> older servers would just reject it as an unknown request type. >>> > >> > >> >>> > >> > >> -David >>> > >> > >> >>> > >> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote: >>> > >> > >> >> >>> > >> > >> >> 1. How does a server report that it supports each command >>> type? >>> > >> > Initial >>> > >> > >> >> thought is a property in GetSqlInfo. >>> > >> > >> > >>> > >> > >> > >>> > >> > >> > This sounds reasonable. >>> > >> > >> > >>> > >> > >> > >>> > >> > >> >> What happens to client code written prior to changing the >>> > command >>> > >> > type >>> > >> > >> >> to be a oneOf field? Same for servers. >>> > >> > >> > >>> > >> > >> > >>> > >> > >> > It is transparent from older clients (I'm 99% sure the wire >>> > protocol >>> > >> > >> > doesn't change). Servers is a little harder. The one saving >>> > grace >>> > >> > is I >>> > >> > >> > don't think an empty/not-present SQL string would be something >>> > most >>> > >> > >> servers >>> > >> > >> > could handle, so they would probably error with something that >>> > while >>> > >> > >> > not-obvious would give a clue to the clients (but hopefully >>> this >>> > >> would >>> > >> > >> be a >>> > >> > >> > non-issue because the capabilities would be checked for clients >>> > >> > wishing >>> > >> > >> to >>> > >> > >> > to use this feature first). >>> > >> > >> > >>> > >> > >> > -Micah >>> > >> > >> > >>> > >> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong < >>> > jam...@bitquilltech.com >>> > >> > >> .invalid> >>> > >> > >> > wrote: >>> > >> > >> > >>> > >> > >> >> It sounds like an interesting and useful project to use >>> > Subtstrait >>> > >> > as an >>> > >> > >> >> alternative to SQL strings. >>> > >> > >> >> >>> > >> > >> >> Important aspects to spec out are: >>> > >> > >> >> 1. How does a server report that it supports each command >>> type? >>> > >> > Initial >>> > >> > >> >> thought is a property in GetSqlInfo. >>> > >> > >> >> 2. What happens to client code written prior to changing the >>> > >> command >>> > >> > >> type >>> > >> > >> >> to be a oneOf field? Same for servers. >>> > >> > >> >> More generally, how should backward compatibility work, and >>> what >>> > >> > should >>> > >> > >> >> happen if a client sends an unsupported >>> > >> > >> >> command type to a server. >>> > >> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait >>> > >> > structures? >>> > >> > >> >> >>> > >> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray < >>> > ray.gavi...@gmail.com> >>> > >> > >> wrote: >>> > >> > >> >> >>> > >> > >> >> > @James Duong <jam...@bitquilltech.com> >>> > >> > >> >> > >>> > >> > >> >> > You are absolutely right, I realized this and confirmed >>> > whether >>> > >> > this >>> > >> > >> >> > would be possible with Jacques to double-check. >>> > >> > >> >> > It would amount to what I might call "dollar-store >>> Substrait." >>> > >> It's >>> > >> > >> not >>> > >> > >> >> > elegant or a good solution, but definitely presents a good >>> > >> > duct-tape >>> > >> > >> hack >>> > >> > >> >> > and is a crafty idea. >>> > >> > >> >> > >>> > >> > >> >> > I agree with Jacques -- when you think about FlightSQL, what >>> > you >>> > >> > are >>> > >> > >> >> > attempting with a query isn't necessarily SQL, but a general >>> > >> > >> data-compute >>> > >> > >> >> > operation. >>> > >> > >> >> > SQL just so happens to be a fairly universal way to express >>> > them, >>> > >> > >> with an >>> > >> > >> >> > ANSI standard, but FlightSQL doesn't recognize any >>> particular >>> > >> > subset >>> > >> > >> of >>> > >> > >> >> it >>> > >> > >> >> > and for all intents and purposes it doesn't matter what the >>> > >> > operation >>> > >> > >> >> > string contains. >>> > >> > >> >> > >>> > >> > >> >> > Substrait would make a fantastic logical next-feature >>> because >>> > >> it's >>> > >> > >> >> > targeted as a specification for expressing relational >>> algebra >>> > and >>> > >> > >> >> > data-compute operations >>> > >> > >> >> > This more-or-less equates to SQL strings (in my mind at >>> least) >>> > >> > with a >>> > >> > >> >> much >>> > >> > >> >> > better toolkit and Dev UX. If there is anything I can do to >>> > help >>> > >> > move >>> > >> > >> >> this >>> > >> > >> >> > forward, please let me know because I am extremely motivated >>> > to >>> > >> do >>> > >> > so. >>> > >> > >> >> > >>> > >> > >> >> > @David Li <git...@lidavidm.me> >>> > >> > >> >> > >>> > >> > >> >> > Also agreed. Substrait is put together by folks much smarter >>> > than >>> > >> > >> myself, >>> > >> > >> >> > and if I had to hedge my bets, I'd put money on it being the >>> > >> > future of >>> > >> > >> >> > data-compute interop. >>> > >> > >> >> > I would love nothing more than to adopt this technology and >>> > push >>> > >> it >>> > >> > >> >> along. >>> > >> > >> >> > >>> > >> > >> >> > Your project does sound interesting - basically, it sounds >>> > like a >>> > >> > >> tabular >>> > >> > >> >> >> data storage service with query pushdown? >>> > >> > >> >> >> >>> > >> > >> >> > >>> > >> > >> >> > Yeah this is more or less the details of it (my personal >>> > email, >>> > >> > with >>> > >> > >> >> > discretion assumed, is always open) >>> > >> > >> >> > >>> > >> > >> >> > Imagine an environment where a backend wants to advertise >>> some >>> > >> > kind of >>> > >> > >> >> > schema/data catalog >>> > >> > >> >> > >>> > >> > >> >> > And then a central service introspects these backends, and >>> > >> > dynamically >>> > >> > >> >> > generates an API from the data catalogues/schemas, where >>> > requests >>> > >> > get >>> > >> > >> >> > proxied to the underlying backend service for each schema to >>> > >> > actually >>> > >> > >> be >>> > >> > >> >> > executed >>> > >> > >> >> > >>> > >> > >> >> > In text, the flow would look something like: >>> > >> > >> >> > >>> > >> > >> >> > >>> > >> > >> >> > <----> Data Provider Backend 0 >>> > >> > >> >> > Client <-----> Central Service <---> Generated API <----> >>> > >> > >> Data-Provider >>> > >> > >> >> > Backend 1 >>> > >> > >> >> > >>> > >> > >> >> > <----> Data Provider Backend 2 >>> > >> > >> >> > >>> > >> > >> >> > >>> > >> > >> >> > >>> > >> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li < >>> lidav...@apache.org> >>> > >> > wrote: >>> > >> > >> >> > >>> > >> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an >>> > >> > >> alternative to >>> > >> > >> >> >> Substrait, at least one that isn't even more nascent or one >>> > >> that's >>> > >> > >> very >>> > >> > >> >> >> tied to a particular language, so perhaps it might be >>> better >>> > to >>> > >> > get >>> > >> > >> >> >> involved in Substrait and see if it suits your needs? >>> > >> Convincing a >>> > >> > >> team >>> > >> > >> >> to >>> > >> > >> >> >> try something new can be hard, though, and it is somewhat >>> of >>> > a >>> > >> > moving >>> > >> > >> >> >> target - but Flight SQL is in a similar spot, I think, as >>> > it's >>> > >> > still >>> > >> > >> >> >> getting enhancements. >>> > >> > >> >> >> >>> > >> > >> >> >> Your project does sound interesting - basically, it sounds >>> > like >>> > >> a >>> > >> > >> >> tabular >>> > >> > >> >> >> data storage service with query pushdown? >>> > >> > >> >> >> >>> > >> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote: >>> > >> > >> >> >> > James, I agree that you could use JSON but that feels a >>> bit >>> > >> > hacky >>> > >> > >> >> >> > (mis-use >>> > >> > >> >> >> > of the paradigm). Instead, I'd really like to do >>> something >>> > >> like >>> > >> > >> David >>> > >> > >> >> is >>> > >> > >> >> >> > suggesting: support Substrait as an alternative to a SQL >>> > >> string. >>> > >> > >> >> >> > Something like this: >>> > >> > >> >> >> > >>> > >> > >> >> >> >>> > >> > >> >> >>> > >> > >> >>> > >> > >>> > >> >>> > >>> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc >>> > >> > >> >> >> > >>> > >> > >> >> >> > It would be great if someone wanted to pick this up. It >>> > would >>> > >> > be a >>> > >> > >> >> nice >>> > >> > >> >> >> > enhancement to FlightSQL (and provide a structured way to >>> > >> > express >>> > >> > >> >> >> > operations). >>> > >> > >> >> >> > >>> > >> > >> >> >> > >>> > >> > >> >> >> > >>> > >> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong < >>> > >> > >> jam...@bitquilltech.com >>> > >> > >> >> >> .invalid> >>> > >> > >> >> >> > wrote: >>> > >> > >> >> >> > >>> > >> > >> >> >> >> In the same way that you could write an ODBC driver that >>> > >> takes >>> > >> > in >>> > >> > >> >> text >>> > >> > >> >> >> >> that's not SQL, you could write a Flight SQL server that >>> > >> takes >>> > >> > in >>> > >> > >> >> text >>> > >> > >> >> >> >> that's JSON. >>> > >> > >> >> >> >> Flight SQL doesn't parse the query, so you could create >>> > >> > commands >>> > >> > >> that >>> > >> > >> >> >> are >>> > >> > >> >> >> >> just JSON text. >>> > >> > >> >> >> >> >>> > >> > >> >> >> >> Is that the only bit you need, Gavin? >>> > >> > >> >> >> >> >>> > >> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray < >>> > >> > ray.gavi...@gmail.com> >>> > >> > >> >> >> wrote: >>> > >> > >> >> >> >> >>> > >> > >> >> >> >> > I am enthusiastic about Substrait and have followed >>> it's >>> > >> > >> progress >>> > >> > >> >> >> eagerly >>> > >> > >> >> >> >> > =D >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > When I presented it as a tentative option, there were >>> > >> > >> reservations >>> > >> > >> >> >> >> because >>> > >> > >> >> >> >> > of the project/spec being young and the functionality >>> > still >>> > >> > >> being >>> > >> > >> >> >> >> > fleshed out. >>> > >> > >> >> >> >> > I think if I were having this conversation in say, >>> 8-16 >>> > >> > months, >>> > >> > >> it >>> > >> > >> >> >> would >>> > >> > >> >> >> >> > have been an easy choice, no doubt. >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > On a public mailing list (and I can share more details >>> > in >>> > >> > >> private >>> > >> > >> >> if >>> > >> > >> >> >> >> you're >>> > >> > >> >> >> >> > curious), the gist of it is this: >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for >>> > >> > expressing >>> > >> > >> >> data >>> > >> > >> >> >> >> > compute operations between services would be a useful >>> > thing >>> > >> > to >>> > >> > >> have >>> > >> > >> >> >> >> > (Especially if it's language-agnostic) >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > The goal is for an "implementing service" to have: >>> > >> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to >>> > me") >>> > >> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform >>> > this >>> > >> > >> operation >>> > >> > >> >> >> on >>> > >> > >> >> >> >> your >>> > >> > >> >> >> >> > data") >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > With FlightSQL this is possible I believe, but it >>> > requires >>> > >> > the >>> > >> > >> >> >> operation >>> > >> > >> >> >> >> to >>> > >> > >> >> >> >> > be expressed as a SQL string which isn't ideal. >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > Working with some programmatic, structured object that >>> > has >>> > >> > the >>> > >> > >> same >>> > >> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query >>> > would >>> > >> > >> have, >>> > >> > >> >> >> would >>> > >> > >> >> >> >> be >>> > >> > >> >> >> >> > a better experience >>> > >> > >> >> >> >> > (Jacques is on to something here!) >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > This interface between services would be somewhat the >>> > >> > >> equivalent of >>> > >> > >> >> >> an >>> > >> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed >>> > library >>> > >> > for >>> > >> > >> >> >> >> expressing >>> > >> > >> >> >> >> > and building-up query/data-compute ops. >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li < >>> > >> lidav...@apache.org >>> > >> > > >>> > >> > >> >> wrote: >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> > > You probably want Substrait: https://substrait.io/ >>> > >> > >> >> >> >> > > >>> > >> > >> >> >> >> > > Which is being worked on by several people, >>> including >>> > >> Arrow >>> > >> > >> >> >> community >>> > >> > >> >> >> >> > > members. >>> > >> > >> >> >> >> > > >>> > >> > >> >> >> >> > > It might be interesting to generalize Flight SQL to >>> > >> include >>> > >> > >> >> >> support for >>> > >> > >> >> >> >> > > Substrait. I'm curious what your application, if >>> > you're >>> > >> > able >>> > >> > >> to >>> > >> > >> >> >> share >>> > >> > >> >> >> >> > more. >>> > >> > >> >> >> >> > > >>> > >> > >> >> >> >> > > -David >>> > >> > >> >> >> >> > > >>> > >> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote: >>> > >> > >> >> >> >> > > > Hiya, >>> > >> > >> >> >> >> > > > >>> > >> > >> >> >> >> > > > I am drafting a proposal for a way to enable >>> > services >>> > >> to >>> > >> > >> >> express >>> > >> > >> >> >> data >>> > >> > >> >> >> >> > > > compute operations to each other. >>> > >> > >> >> >> >> > > > >>> > >> > >> >> >> >> > > > However I think it'll be difficult to get buy-in >>> if >>> > the >>> > >> > only >>> > >> > >> >> >> >> > > representation >>> > >> > >> >> >> >> > > > for queries is as SQL strings. >>> > >> > >> >> >> >> > > > >>> > >> > >> >> >> >> > > > Is there any kind of lower-level API that can be >>> > used >>> > >> to >>> > >> > >> >> express >>> > >> > >> >> >> >> > > operations? >>> > >> > >> >> >> >> > > > >>> > >> > >> >> >> >> > > > IE instead of "SELECT name FROM user" >>> > >> > >> >> >> >> > > > >>> > >> > >> >> >> >> > > > A structured representation like: >>> > >> > >> >> >> >> > > > { >>> > >> > >> >> >> >> > > > "op": "query", >>> > >> > >> >> >> >> > > > "schema": "user", >>> > >> > >> >> >> >> > > > "project": ["name"] >>> > >> > >> >> >> >> > > > } >>> > >> > >> >> >> >> > > > >>> > >> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense? >>> > >> > >> >> >> >> > > > >>> > >> > >> >> >> >> > > > Thank you =) >>> > >> > >> >> >> >> > > >>> > >> > >> >> >> >> > >>> > >> > >> >> >> >> >>> > >> > >> >> >> >> >>> > >> > >> >> >> >> -- >>> > >> > >> >> >> >> >>> > >> > >> >> >> >> *James Duong* >>> > >> > >> >> >> >> Lead Software Developer >>> > >> > >> >> >> >> Bit Quill Technologies Inc. >>> > >> > >> >> >> >> Direct: +1.604.562.6082 | jam...@bitquilltech.com >>> > >> > >> >> >> >> https://www.bitquilltech.com >>> > >> > >> >> >> >> >>> > >> > >> >> >> >> This email message is for the sole use of the intended >>> > >> > >> recipient(s) >>> > >> > >> >> >> and may >>> > >> > >> >> >> >> contain confidential and privileged information. Any >>> > >> > unauthorized >>> > >> > >> >> >> review, >>> > >> > >> >> >> >> use, disclosure, or distribution is prohibited. If you >>> > are >>> > >> not >>> > >> > >> the >>> > >> > >> >> >> >> intended recipient, please contact the sender by reply >>> > email >>> > >> > and >>> > >> > >> >> >> destroy >>> > >> > >> >> >> >> all copies of the original message. Thank you. >>> > >> > >> >> >> >> >>> > >> > >> >> >> >>> > >> > >> >> > >>> > >> > >> >> >>> > >> > >> >> -- >>> > >> > >> >> >>> > >> > >> >> *James Duong* >>> > >> > >> >> Lead Software Developer >>> > >> > >> >> Bit Quill Technologies Inc. >>> > >> > >> >> Direct: +1.604.562.6082 | jam...@bitquilltech.com >>> > >> > >> >> https://www.bitquilltech.com >>> > >> > >> >> >>> > >> > >> >> This email message is for the sole use of the intended >>> > recipient(s) >>> > >> > and >>> > >> > >> may >>> > >> > >> >> contain confidential and privileged information. Any >>> > unauthorized >>> > >> > >> review, >>> > >> > >> >> use, disclosure, or distribution is prohibited. If you are >>> not >>> > the >>> > >> > >> >> intended recipient, please contact the sender by reply email >>> and >>> > >> > destroy >>> > >> > >> >> all copies of the original message. Thank you. >>> > >> > >> >> >>> > >> > >> >>> > >> > >>> > >> >>> > >>> >>> >>> -- >>> >>> *James Duong* >>> Lead Software Developer >>> Bit Quill Technologies Inc. >>> Direct: +1.604.562.6082 | jam...@bitquilltech.com >>> https://www.bitquilltech.com >>> >>> This email message is for the sole use of the intended recipient(s) and may >>> contain confidential and privileged information. Any unauthorized review, >>> use, disclosure, or distribution is prohibited. If you are not the >>> intended recipient, please contact the sender by reply email and destroy >>> all copies of the original message. Thank you. >>>