Thank you -- I think the usecase is great, but agree with the other
reviewers that the name may be confusing. I left some notes on the ticket

Andrew

On Wed, Feb 14, 2024 at 3:52 PM David Li <lidav...@apache.org> wrote:

> I've put up a candidate implementation sans integration test [1].
>
> Some caveats:
> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?'
> (yes, an empty query string pacifies it). I've chosen the latter since the
> former is technically a URI with a non-empty path but neither are ideal.
> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more
> faithful to the intended use than 'fallback'.
>
> [1]: https://github.com/apache/arrow/pull/40084
>
> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
> > Hi David,
> >
> > It's reasonable. I think we can start with your initial proposal (it
> > sounds fine to me) and we can always improve step by step.
> >
> > Thanks !
> > Regards
> > JB
> >
> > On Tue, Feb 13, 2024 at 4:53 PM David Li <lidav...@apache.org> wrote:
> >>
> >> I'm going to keep the proposal as-is then. It can be extended if this
> use case comes up.
> >>
> >> I'll start work on candidate implementations now.
> >>
> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
> >> > I think the original proposal is sufficient.
> >> >
> >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls
> to
> >> > http without an explicit server location (unless both Flight servers
> are
> >> > hosted under the same port?). So the "+" proposal seems a bit weird.
> >> >
> >> >
> >> > Le 12/02/2024 à 23:39, David Li a écrit :
> >> >> The idea is that the client would reuse the existing connection, in
> which case the protocol and such are implicit. (If the client doesn't have
> a connection anymore, it can't use the fallback anyways.)
> >> >>
> >> >> I suppose this has the advantage that you could "fall back" to a
> known hostname with a different protocol, but I'm not sure that always
> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX
> addresses aren't hostnames but rather opaque byte blobs, for instance.)
> >> >>
> >> >> If we do prefer this, to avoid overloading the hostname, there's
> also the informal convention of using + in the scheme, so it could be
> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
> >> >>
> >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
> >> >>> Thanks for clarifying.
> >> >>>
> >> >>> Given the relationship between these two proposals, would it also be
> >> >>> necessary to distinguish the scheme (or schemes) supported by the
> >> >>> originating Flight RPC service?
> >> >>>
> >> >>> If that is the case, it may be preferred to use the "host" portion
> of the
> >> >>> URI rather than the "scheme" to denote the location of the data. In
> this
> >> >>> scenario, the host "0.0.0.0" could be used. This IP address is
> defined in
> >> >>> IETF RFC1122 [1] as "This host on this network", which seems most
> >> >>> consistent with the intended use-case. There are some caveats to
> this usage
> >> >>> but in my experience it's not uncommon for protocols to extend the
> >> >>> definition of this address in their own usage.
> >> >>>
> >> >>> A benefit of this convention is that the scheme remains available
> in the
> >> >>> URI to specify the transport available. For example, the following
> list of
> >> >>> locations may be included in the response:
> >> >>>
> >> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4",
> <other_locations>...]
> >> >>>
> >> >>> This would indicate that grpc and ucx transport is available from
> the
> >> >>> current service, grpc is available at 1.2.3.4, and possibly more
> >> >>> combinations of scheme/host.
> >> >>>
> >> >>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
> >> >>>
> >> >>> On Mon, Feb 12, 2024 at 2:53 PM David Li <lidav...@apache.org>
> wrote:
> >> >>>
> >> >>>> Ah, while I was thinking of it as useful for a fallback, I'm not
> >> >>>> specifying it that way.  Better ideas for names would be
> appreciated.
> >> >>>>
> >> >>>> The actual precedence has never been specified. All endpoints are
> >> >>>> equivalent, so clients may use what is "best". For instance, with
> Matt
> >> >>>> Topol's concurrent proposal, a GPU-enabled client may
> preferentially try
> >> >>>> UCX endpoints while other clients may choose to ignore them
> entirely (e.g.
> >> >>>> because they don't have UCX installed).
> >> >>>>
> >> >>>> In practice the ADBC/JDBC drivers just scan the list left to right
> and try
> >> >>>> each endpoint in turn for lack of a better heuristic.
> >> >>>>
> >> >>>> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
> >> >>>>> Thanks for proposing this David.
> >> >>>>>
> >> >>>>> I think the ability to include the Flight RPC service itself in
> the list
> >> >>>> of
> >> >>>>> endpoints from which data can be fetched is a helpful addition.
> >> >>>>>
> >> >>>>> The current choice of name for the URI (arrow-flight-fallback://)
> seems
> >> >>>> to
> >> >>>>> imply that there is an order of precedence that should be
> considered in
> >> >>>> the
> >> >>>>> list of URI’s. Specifically, as a developer receiving the list of
> >> >>>> locations
> >> >>>>> I might assume that I should try fetching from other locations
> first. If
> >> >>>>> those do not succeed, I may try the original service as a
> fallback.
> >> >>>>>
> >> >>>>> Are these the intended semantics? If so, is there a way to
> include the
> >> >>>>> original service in the list of locations without the implied
> precedence?
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Joel
> >> >>>>>
> >> >>>>> On Mon, Feb 12, 2024 at 11:52 James Duong <
> james.du...@improving.com
> >> >>>> .invalid>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> This seems like a good idea, and also improves consistency with
> clients
> >> >>>>>> that erroneously assumed that the service endpoint was always in
> the
> >> >>>> list
> >> >>>>>> of endpoints.
> >> >>>>>>
> >> >>>>>> From: Antoine Pitrou <anto...@python.org>
> >> >>>>>> Date: Monday, February 12, 2024 at 6:05 AM
> >> >>>>>> To: dev@arrow.apache.org <dev@arrow.apache.org>
> >> >>>>>> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme
> >> >>>>>>
> >> >>>>>> Hello,
> >> >>>>>>
> >> >>>>>> This looks fine to me.
> >> >>>>>>
> >> >>>>>> Regards
> >> >>>>>>
> >> >>>>>> Antoine.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Le 12/02/2024 à 14:46, David Li a écrit :
> >> >>>>>>> Hello,
> >> >>>>>>>
> >> >>>>>>> I'd like to propose a slight update to Flight RPC to make
> Flight SQL
> >> >>>>>> work better in different deployment scenarios.  Comments on the
> doc
> >> >>>> would
> >> >>>>>> be appreciated:
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>
> https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing
> >> >>>>>>>
> >> >>>>>>> The gist is that FlightEndpoint allows specifying either (1) a
> list of
> >> >>>>>> concrete URIs to fetch data from or (2) no URIs, meaning to
> fetch from
> >> >>>> the
> >> >>>>>> Flight RPC service itself; but it would be useful to combine both
> >> >>>> behaviors
> >> >>>>>> (try these concrete URIs and fall back to the Flight RPC service
> itself)
> >> >>>>>> without requiring the service to know its own public address.
> >> >>>>>>>
> >> >>>>>>> Best,
> >> >>>>>>> David
> >> >>>>>>
> >> >>>>
>

Reply via email to