Thank you both for pushing this forward

On Mon, Dec 12, 2022 at 1:14 PM James Duong
<james.du...@improving.com.invalid> wrote:

> Hi David,
>
> I've written up the URI parsing in C++ and started adding session
> management messages. I'm also planning on having the
> ClientCookieMiddlewareFactory be able to report if sessions are enabled on
> the server.
>
> I (or another developer) will send an update once those features are ready
> for demo.
> ________________________________
> From: David Li <lidav...@apache.org>
> Sent: December 12, 2022 10:07 AM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: DISCUSS: [FlightSQL] Catalog support
>
> Following up here, James are you interested in putting up a draft PR for
> the Flight SQL URI format and for session management?
>
> The Flight SQL URI format would then also cover Andrew's use case. And if
> someone wants to draw up a PR to the JDBC driver to enable arbitrary
> properties, I can review that too.
>
> On Sat, Dec 3, 2022, at 05:38, Andrew Lamb wrote:
> >> Andrew, do we need to look into adding more metadata to indicate
> > different query languages? (It's quite a shame that we named this Flight
> > SQL at this point...)
> >
> > TDLR is I don't think trying to explicitly support languages other than
> SQL
> > in FlightSQL is a good idea. Among other reasons, the JDBC / ODBC
> drivers,
> > which mostly assume SQL, are one of the key features of FlightSQL, and
> they
> > are likely not as useful for non SQL. I can see the argument to support
> for
> > substrait plans, and it will be interesting to see what use cases benefit
> > from that support.
> >
> > What would make our life easier would be some standard way to pass
> > application specific key/value pairs from the JDBC driver to a flight SQL
> > backend with each request (perhaps via gRPC headers). This would allow
> > passing configuration parameters that were not envisioned in the spec,
> from
> > end user (of the JDBC driver) all the way to our backend.
> >
> > Thanks again for driving this forward,
> > Andrew
> >
> > On Thu, Dec 1, 2022 at 7:11 PM David Li <lidav...@apache.org> wrote:
> >
> >> Hey James, thanks for putting this up.
> >>
> >> Inline:
> >>
> >> > The suggestion is to make this part of Flight as an
> >> > optional feature, rather than Flight SQL due to its applicability
> outside
> >> > of just database access.
> >>
> >> Which uses do you see? I see statefulness as a general antipattern here,
> >> so I'm wary of introducing it beyond where we need it.
> >>
> >> > - The Flight client supplies a New-Session header which has key-value
> >> pairs
> >> > for initial session options. This header can be applied to any RPC
> call,
> >> > but logically should be the first one the client makes.
> >>
> >> Handshake already effectively serves as this RPC - maybe we could extend
> >> it? (I also see Handshake as an antipattern because it's a stateful auth
> >> mechanism.)
> >>
> >> Should the session timeout/be on a lease? (gRPC doesn't really give the
> >> server a way to track the persistence of a particular client
> connection.)
> >>
> >> > It's a bit asymmetric that creating a new session is done by applying
> a
> >> > header, but closing a session is an RPC call. This was so that session
> >> > creation doesn't introduce another round trip before the first real
> data
> >> > request. If there's a way to batch RPC calls it might be better to
> make
> >> > session creation an RPC call.
> >>
> >> Is this a worrisome amount of overhead?
> >>
> >> Unfortunately gRPC doesn't batch RPC calls, but RPC calls on the same
> >> client generally share the same TCP connection (modulo load balancing
> >> behavior, but presumably that is not enabled if you want persistent
> >> sessions).
> >>
> >> On the implementation side, I'd like to avoid baking this in too deeply
> if
> >> at all possible. Ideally it'd be implemented entirely as middleware,
> >> possibly making use of an interface so applications can override the
> >> session storage (hashtable, Redis, etcd, etc.)
> >>
> >> > Just to chime in on this, one thing I'm curious about is whether there
> >> > will be support for user-defined catalog/schema hierarchy depth?
> >>
> >> Gavin, for ADBC we discussed adding a delimiter to the catalog name to
> >> handle this case - maybe we can handle this by adding a property for the
> >> delimiter to SqlInfo?
> >>
> >> > https://github.com/influxdata/influxdb_iox/issues/6102
> >>
> >> Andrew, do we need to look into adding more metadata to indicate
> different
> >> query languages? (It's quite a shame that we named this Flight SQL at
> this
> >> point...)
> >>
> >> On Thu, Dec 1, 2022, at 08:49, Andrew Lamb wrote:
> >> > Sorry for the late reply -- thank you James and David for this
> >> discussion.
> >> >
> >> > I agree that adding Catalog support would be a valuable addition to
> >> Flight
> >> > SQL, and it recently came up as we begin to implement Flight SQL in
> >> > InfluxDB IOx [1].
> >> >
> >> >> - A standard URI scheme for Flight SQL that can be used by multiple
> >> > client APIs (JDBC, ADBC, etc.)
> >> >
> >> > I agree this would be very valuable, along with a standard way
> (ideally
> >> > with HTTP headers) to send this information as part of the FlightSQL
> gRPC
> >> > requests.
> >> >
> >> >> I'd suggest we define session management features explicitly in
> Flight
> >> > (while being optional).
> >> >
> >> > I agree it is critical that server-side state is not required to
> >> implement
> >> > FlightSQL. Stateful connections would likely complicate deploying
> >> FlightSQL
> >> > in distributed systems. I suggest it should be possible to implement
> any
> >> > session management features by sending the entire session state with
> the
> >> > request, if desired.
> >> >
> >> > I don't have a strong opinion about the merits of including explicit
> >> > session management features in FlightSQL. It seems to me that keeping
> the
> >> > API surface of FlightSQL minimal and implementation flexibility
> maximal
> >> > should be the default. However, if JDBC/ODBC driver compatibility
> would
> >> be
> >> > improved with explicit state management APIs, then adding them to
> >> FlightSQL
> >> > seems like a good idea to me.
> >> >
> >> > Thanks again -- it is amazing to hit some issue in design and then
> find
> >> out
> >> > the Arrow community is already hard at work on a solution.
> >> >
> >> > Andrew
> >> >
> >> > [1] https://github.com/influxdata/influxdb_iox/issues/6102
> >> >
> >> > On Wed, Nov 30, 2022 at 7:17 PM Gavin Ray <ray.gavi...@gmail.com>
> wrote:
> >> >
> >> >> Just to chime in on this, one thing I'm curious about is whether
> there
> >> >> will be support for user-defined catalog/schema hierarchy depth?
> >> >>
> >> >> This comment that James made does seem reasonable to me
> >> >> > scheme://<host>:<port>/path-1/path-2/.../path-n
> >> >>
> >> >> Trino/Presto does a similar thing
> (jdbc:trino://localhost:8080/tpch/sf1)
> >> >>
> >> >> At Hasura, what we do is have an alias "FullyQualifiedName" which is
> >> >> just "Array<String>"
> >> >> and the identifier to some element in a data source is always
> >> >> fully-qualified:
> >> >>
> >> >>
> https://github.com/hasura/graphql-engine/tree/master/dc-agents#schema
> >> >>
> >> >> ["postgres_1", "db1", "schema2", "my_table", "col_a"]
> >> >> ["mongo", "db1",  "collection_a", "field_a"]
> >> >> ["csv_adapter", "myfile.csv", "col_x"]
> >> >>
> >> >> On Wed, Nov 30, 2022 at 6:31 PM James Duong
> >> >> <jam...@bitquilltech.com.invalid> wrote:
> >> >> >
> >> >> > Our current convention of sending connection properties as headers
> >> with
> >> >> > every request has the benefit of making statefulness optional, but
> has
> >> >> the
> >> >> > drawback of sending redundant, unused properties on requests after
> the
> >> >> > first, which increases the payload size unnecessarily.
> >> >> >
> >> >> > I'd suggest we define session management features explicitly in
> Flight
> >> >> > (while being optional). The suggestion is to make this part of
> Flight
> >> as
> >> >> an
> >> >> > optional feature, rather than Flight SQL due to its applicability
> >> outside
> >> >> > of just database access.
> >> >> >
> >> >> > Creating a session:
> >> >> > - The Flight client supplies a New-Session header which has
> key-value
> >> >> pairs
> >> >> > for initial session options. This header can be applied to any RPC
> >> call,
> >> >> > but logically should be the first one the client makes.
> >> >> > - The server should send a Set-Cookie header back containing some
> >> >> > server-side representation of the session that the client can use
> in
> >> >> > subsequent requests.
> >> >> > - The path specified in the URI is sent as a "Catalog" session
> option.
> >> >> >
> >> >> > Modifying session options:
> >> >> > - A separate RPC call that takes in a Stream<string, string>
> >> representing
> >> >> > each session option that is being modified and returns a stream of
> >> >> statuses
> >> >> > to indicate if the setting change was accepted.
> >> >> > - This RPC call is only valid when the Cookie header is used.
> >> >> > - It is up to the server to define if a failed session property
> >> change is
> >> >> > fatal or if other properties can continue to be set.
> >> >> >
> >> >> > Closing a session:
> >> >> > - A separate RPC call that tells the server to drop the session
> >> specified
> >> >> > by the Cookie header.
> >> >> >
> >> >> > Notes:
> >> >> > A Flight SQL client would check if session management RPCs are
> >> supported
> >> >> > through a new GetSqlInfo property. A Flight client doesn't have a
> way
> >> to
> >> >> do
> >> >> > this generically, but there could be an application-specific RPC or
> >> >> header
> >> >> > that reports this metadata.
> >> >> >
> >> >> > The O/JDBC and ADBC drivers would need to be updated to
> >> programmatically
> >> >> > check for session management RPCs. If unsupported, then use the old
> >> >> > behavior of sending all properties as headers with each request. If
> >> >> > supported, make use of the New-Session header and drop the session
> >> when
> >> >> > closing the client-side connection.
> >> >> >
> >> >> > It's a bit asymmetric that creating a new session is done by
> applying
> >> a
> >> >> > header, but closing a session is an RPC call. This was so that
> session
> >> >> > creation doesn't introduce another round trip before the first real
> >> data
> >> >> > request. If there's a way to batch RPC calls it might be better to
> >> make
> >> >> > session creation an RPC call.
> >> >> >
> >> >> > On Tue, Nov 22, 2022 at 3:16 PM David Li <lidav...@apache.org>
> wrote:
> >> >> >
> >> >> > > It sounds reasonable - then there are three points:
> >> >> > >
> >> >> > > - A standard URI scheme for Flight SQL that can be used by
> multiple
> >> >> client
> >> >> > > APIs (JDBC, ADBC, etc.)
> >> >> > > - A standard scheme for session data (likely header/cookie-based)
> >> >> > > - A mapping from URI parameters and fields to session data
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > On Tue, Nov 22, 2022, at 17:45, James Duong wrote:
> >> >> > > > Just following up on this and if there are any thoughts.
> >> >> > > >
> >> >> > > > The purpose would be to standardize how we specify access to
> some
> >> >> named
> >> >> > > > logical grouping of data. This would make it easy to model
> >> >> catalog/schema
> >> >> > > > semantics in Flight SQL.
> >> >> > > >
> >> >> > > > Having this be part of the connection URI makes it similar to
> >> >> specifying
> >> >> > > a
> >> >> > > > resource in an HTTP URL (ie an endpoint) which should make it
> easy
> >> >> for
> >> >> > > end
> >> >> > > > users to work with and modify.
> >> >> > > >
> >> >> > > > On Fri, Nov 18, 2022 at 3:17 PM James Duong <
> >> jam...@bitquilltech.com
> >> >> >
> >> >> > > wrote:
> >> >> > > >
> >> >> > > >> As for surfacing catalogs itself, perhaps we allow the URI
> take
> >> in a
> >> >> > > path
> >> >> > > >> and treat that as a way of specifying a multi-level resource
> that
> >> >> which
> >> >> > > the
> >> >> > > >> FlightClient is connecting to:
> >> >> > > >>
> >> >> > > >> eg a connection URI of the form:
> >> >> > > >> scheme://<host>:<port>/path-1/path-2/.../path-n
> >> >> > > >>
> >> >> > > >> The FlightClient could send this path as either a header or a
> >> >> session
> >> >> > > >> property (with a neutral name like 'resource-path'). Flight
> SQL
> >> >> > > Producers
> >> >> > > >> could interpret this as a catalog or schema.
> >> >> > > >> eg
> >> >> > > >> grpc://<host>:<port>/catalog/schema
> >> >> > > >>
> >> >> > > >> On Fri, Nov 11, 2022 at 2:07 AM James Henderson <j...@juxt.pro
> >
> >> >> wrote:
> >> >> > > >>
> >> >> > > >>> Sounds good to me.
> >> >> > > >>>
> >> >> > > >>> > Are you interested in writing up a (sketch of a) proposal?
> >> >> > > >>>
> >> >> > > >>> Yep, can do - I'm OoO over the next couple of weeks so might
> be
> >> a
> >> >> bit
> >> >> > > >>> intermittent.
> >> >> > > >>>
> >> >> > > >>> On Thu, 10 Nov 2022 at 15:28, David Li <lidav...@apache.org>
> >> >> wrote:
> >> >> > > >>>
> >> >> > > >>> > Hey James H.,
> >> >> > > >>> >
> >> >> > > >>> > That would make sense to me. So it sounds like we'd want
> >> >> > > >>> >
> >> >> > > >>> > - Formal specification of using cookies/headers to mark a
> >> >> 'session'
> >> >> > > (I
> >> >> > > >>> > guess this will be a little inconsistent with transactions,
> >> >> though)
> >> >> > > >>> > - Adding RPCs to query session values
> >> >> > > >>> > - Adding RPCs to set session values
> >> >> > > >>> > - Listing standard values and types
> >> >> > > >>> >
> >> >> > > >>> > Some things may require more consideration, e.g.
> transaction
> >> >> > > isolation
> >> >> > > >>> > might be better off as part of the transaction RPCs than an
> >> >> ambient
> >> >> > > >>> > property. Are you interested in writing up a (sketch of a)
> >> >> proposal?
> >> >> > > >>> >
> >> >> > > >>> > -David
> >> >> > > >>> >
> >> >> > > >>> > On Thu, Nov 10, 2022, at 10:09, James Henderson wrote:
> >> >> > > >>> > > Similarly, we're also currently considering how best to
> >> >> implement
> >> >> > > >>> some of
> >> >> > > >>> > > the SQL standard session variables in our Flight SQL
> server
> >> -
> >> >> > > things
> >> >> > > >>> like
> >> >> > > >>> > > current transaction isolation level, access mode, time
> zone
> >> >> etc,
> >> >> > > which
> >> >> > > >>> > seem
> >> >> > > >>> > > to have similar properties to the (traditional)
> connection's
> >> >> > > current
> >> >> > > >>> > > catalog. We'd (perhaps naively) looked at solutions
> >> involving
> >> >> the
> >> >> > > >>> Flight
> >> >> > > >>> > > client's `ClientCookieMiddleware`, but might these have
> >> >> > > standardised
> >> >> > > >>> > > support within Flight SQL itself eventually?
> >> >> > > >>> > >
> >> >> > > >>> > > Cheers,
> >> >> > > >>> > >
> >> >> > > >>> > > James
> >> >> > > >>> > >
> >> >> > > >>> > > On Wed, 9 Nov 2022 at 21:51, David Li <
> lidav...@apache.org>
> >> >> wrote:
> >> >> > > >>> > >
> >> >> > > >>> > >> I think having better support for this makes sense, but
> >> >> perhaps we
> >> >> > > >>> can
> >> >> > > >>> > >> find a way to make it not tied to the connection itself?
> >> For
> >> >> > > >>> instance,
> >> >> > > >>> > in
> >> >> > > >>> > >> the same way transactions were implemented (as a
> handle).
> >> Or
> >> >> > > rather,
> >> >> > > >>> > >> instead of adding connection statefulness to Flight RPC,
> >> I'd
> >> >> > > rather
> >> >> > > >>> try
> >> >> > > >>> > to
> >> >> > > >>> > >> work within the gRPC/RPC paradigm.
> >> >> > > >>> > >>
> >> >> > > >>> > >> On Wed, Nov 9, 2022, at 16:47, James Duong wrote:
> >> >> > > >>> > >> > Databases such as SQL Server and PostgreSQL have the
> >> >> concept of
> >> >> > > >>> > catalogs
> >> >> > > >>> > >> as
> >> >> > > >>> > >> > containers of database schemas. Users can usually
> >> specify an
> >> >> > > >>> initial
> >> >> > > >>> > >> > catalog during the connection process, list catalogs,
> and
> >> >> > > sometimes
> >> >> > > >>> > >> change
> >> >> > > >>> > >> > catalogs during the session.
> >> >> > > >>> > >> >
> >> >> > > >>> > >> > Currently catalog support in Flight SQL is somewhat
> >> >> limited. The
> >> >> > > >>> > protocol
> >> >> > > >>> > >> > provides a way to list catalogs as well as metadata in
> >> >> > > SqlTypeInfo
> >> >> > > >>> for
> >> >> > > >>> > >> > reporting how catalogs are supported from a syntax
> >> >> perspective.
> >> >> > > >>> > >> >
> >> >> > > >>> > >> > ODBC and JDBC provide API calls for changing the
> catalog.
> >> >> > > >>> > Additionally,
> >> >> > > >>> > >> > Flight SQL doesn't really provide the concept of
> >> "initial"
> >> >> > > >>> connection
> >> >> > > >>> > >> > properties (such as a starting catalog) since Flight
> >> itself
> >> >> is
> >> >> > > >>> > stateless
> >> >> > > >>> > >> > from a connection perspective.
> >> >> > > >>> > >> >
> >> >> > > >>> > >> > To support catalogs properly, I'd imagine we need to
> make
> >> >> some
> >> >> > > >>> > changes to
> >> >> > > >>> > >> > the Flight SQL protocol:
> >> >> > > >>> > >> > - Introduce the concept of connection-time properties
> >> >> (perhaps
> >> >> > > an
> >> >> > > >>> > >> optional
> >> >> > > >>> > >> > RPC for Flight SQL applications that need this)
> >> >> > > >>> > >> > - Related to the above, expand the connection URL and
> >> Java
> >> >> > > builder
> >> >> > > >>> to
> >> >> > > >>> > >> allow
> >> >> > > >>> > >> > arbitrary application-specific properties.
> >> >> > > >>> > >> > - Add optional RPCs for changing the catalog and
> relevant
> >> >> error
> >> >> > > >>> codes
> >> >> > > >>> > if
> >> >> > > >>> > >> > this is not permitted.
> >> >> > > >>> > >> >
> >> >> > > >>> > >> >
> >> >> > > >>> > >> > --
> >> >> > > >>> > >> >
> >> >> > > >>> > >> > *James Duong*
> >> >> > > >>> > >> > Lead Software Developer
> >> >> > > >>> > >> > Bit Quill Technologies Inc.
> >> >> > > >>> > >> > Direct: +1.604.562.6082 | jam...@bitquilltech.com
> >> >> > > >>> > >> > https://www.bitquilltech.com
> >> >> > > >>> > >> >
> >> >> > > >>> > >> > This email message is for the sole use of the intended
> >> >> > > recipient(s)
> >> >> > > >>> > and
> >> >> > > >>> > >> may
> >> >> > > >>> > >> > contain confidential and privileged information.  Any
> >> >> > > unauthorized
> >> >> > > >>> > >> review,
> >> >> > > >>> > >> > use, disclosure, or distribution is prohibited.  If
> you
> >> are
> >> >> not
> >> >> > > the
> >> >> > > >>> > >> > intended recipient, please contact the sender by reply
> >> >> email and
> >> >> > > >>> > destroy
> >> >> > > >>> > >> > all copies of the original message.  Thank you.
> >> >> > > >>> > >>
> >> >> > > >>> > >
> >> >> > > >>> > >
> >> >> > > >>> > > --
> >> >> > > >>> > > *James Henderson*
> >> >> > > >>> > > XTDB Developer at *JUXT*
> >> >> > > >>> > > Email j...@juxt.pro
> >> >> > > >>> > > Website https://juxt.pro
> >> >> > > >>> > >
> >> >> > > >>> > > [image: photo]
> >> >> > > >>> >
> >> >> > > >>>
> >> >> > > >>>
> >> >> > > >>> --
> >> >> > > >>> *James Henderson*
> >> >> > > >>> XTDB Developer at *JUXT*
> >> >> > > >>> Email j...@juxt.pro
> >> >> > > >>> Website https://juxt.pro
> >> >> > > >>>
> >> >> > > >>> [image: photo]
> >> >> > > >>>
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> --
> >> >> > > >>
> >> >> > > >> *James Duong*
> >> >> > > >> Lead Software Developer
> >> >> > > >> Bit Quill Technologies Inc.
> >> >> > > >> Direct: +1.604.562.6082 | jam...@bitquilltech.com
> >> >> > > >> https://www.bitquilltech.com
> >> >> > > >>
> >> >> > > >> This email message is for the sole use of the intended
> >> recipient(s)
> >> >> and
> >> >> > > >> may contain confidential and privileged information.  Any
> >> >> unauthorized
> >> >> > > >> review, use, disclosure, or distribution is prohibited.  If
> you
> >> are
> >> >> not
> >> >> > > the
> >> >> > > >> intended recipient, please contact the sender by reply email
> and
> >> >> destroy
> >> >> > > >> all copies of the original message.  Thank you.
> >> >> > > >>
> >> >> > > >
> >> >> > > >
> >> >> > > > --
> >> >> > > >
> >> >> > > > *James Duong*
> >> >> > > > Lead Software Developer
> >> >> > > > Bit Quill Technologies Inc.
> >> >> > > > Direct: +1.604.562.6082 | jam...@bitquilltech.com
> >> >> > > > https://www.bitquilltech.com
> >> >> > > >
> >> >> > > > This email message is for the sole use of the intended
> >> recipient(s)
> >> >> and
> >> >> > > may
> >> >> > > > contain confidential and privileged information.  Any
> unauthorized
> >> >> > > review,
> >> >> > > > use, disclosure, or distribution is prohibited.  If you are not
> >> the
> >> >> > > > intended recipient, please contact the sender by reply email
> and
> >> >> destroy
> >> >> > > > all copies of the original message.  Thank you.
> >> >> > >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > *James Duong*
> >> >> > Lead Software Developer
> >> >> > Bit Quill Technologies Inc.
> >> >> > Direct: +1.604.562.6082 | jam...@bitquilltech.com
> >> >> > https://www.bitquilltech.com
> >> >> >
> >> >> > This email message is for the sole use of the intended recipient(s)
> >> and
> >> >> may
> >> >> > contain confidential and privileged information.  Any unauthorized
> >> >> review,
> >> >> > use, disclosure, or distribution is prohibited.  If you are not the
> >> >> > intended recipient, please contact the sender by reply email and
> >> destroy
> >> >> > all copies of the original message.  Thank you.
> >> >>
> >>
>

Reply via email to