I am enthusiastic about Substrait and have followed it's progress eagerly =D

When I presented it as a tentative option, there were reservations because
of the project/spec being young and the functionality still being
fleshed out.
I think if I were having this conversation in say, 8-16 months, it would
have been an easy choice, no doubt.

On a public mailing list (and I can share more details in private if you're
curious), the gist of it is this:

Some well-defined/backed-by-mature tech solution for expressing data
compute operations between services would be a useful thing to have
(Especially if it's language-agnostic)

The goal is for an "implementing service" to have:
- An introspectable schema (IE, "describe yourself to me")
- A query/operation execution endpoint (IE: "perform this operation on your
data")

With FlightSQL this is possible I believe, but it requires the operation to
be expressed as a SQL string which isn't ideal.

Working with some programmatic, structured object that has the same
semantics ("Logical Plan", or whatnot) as a SQL query would have, would be
a better experience
(Jacques is on to something here!)

This interface between services would be somewhat the equivalent of an
"SDK", so it would be nice to have a strongly-typed library for expressing
and building-up query/data-compute ops.


On Thu, Mar 3, 2022 at 3:17 PM David Li <lidav...@apache.org> wrote:

> You probably want Substrait: https://substrait.io/
>
> Which is being worked on by several people, including Arrow community
> members.
>
> It might be interesting to generalize Flight SQL to include support for
> Substrait. I'm curious what your application, if you're able to share more.
>
> -David
>
> On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
> > Hiya,
> >
> > I am drafting a proposal for a way to enable services to express data
> > compute operations to each other.
> >
> > However I think it'll be difficult to get buy-in if the only
> representation
> > for queries is as SQL strings.
> >
> > Is there any kind of lower-level API that can be used to express
> operations?
> >
> > IE instead of "SELECT name FROM user"
> >
> > A structured representation like:
> > {
> >   "op": "query",
> >   "schema": "user",
> >   "project": ["name"]
> > }
> >
> > Or maybe this is a bad idea/doesn't make sense?
> >
> > Thank you =)
>

Reply via email to