Thank you -- I think the usecase is great, but agree with the other reviewers that the name may be confusing. I left some notes on the ticket
Andrew On Wed, Feb 14, 2024 at 3:52 PM David Li <lidav...@apache.org> wrote: > I've put up a candidate implementation sans integration test [1]. > > Some caveats: > - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?' > (yes, an empty query string pacifies it). I've chosen the latter since the > former is technically a URI with a non-empty path but neither are ideal. > - I've changed the scheme to 'arrow-flight-reuse-connection' to be more > faithful to the intended use than 'fallback'. > > [1]: https://github.com/apache/arrow/pull/40084 > > On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote: > > Hi David, > > > > It's reasonable. I think we can start with your initial proposal (it > > sounds fine to me) and we can always improve step by step. > > > > Thanks ! > > Regards > > JB > > > > On Tue, Feb 13, 2024 at 4:53 PM David Li <lidav...@apache.org> wrote: > >> > >> I'm going to keep the proposal as-is then. It can be extended if this > use case comes up. > >> > >> I'll start work on candidate implementations now. > >> > >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote: > >> > I think the original proposal is sufficient. > >> > > >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls > to > >> > http without an explicit server location (unless both Flight servers > are > >> > hosted under the same port?). So the "+" proposal seems a bit weird. > >> > > >> > > >> > Le 12/02/2024 à 23:39, David Li a écrit : > >> >> The idea is that the client would reuse the existing connection, in > which case the protocol and such are implicit. (If the client doesn't have > a connection anymore, it can't use the fallback anyways.) > >> >> > >> >> I suppose this has the advantage that you could "fall back" to a > known hostname with a different protocol, but I'm not sure that always > applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX > addresses aren't hostnames but rather opaque byte blobs, for instance.) > >> >> > >> >> If we do prefer this, to avoid overloading the hostname, there's > also the informal convention of using + in the scheme, so it could be > arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc. > >> >> > >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote: > >> >>> Thanks for clarifying. > >> >>> > >> >>> Given the relationship between these two proposals, would it also be > >> >>> necessary to distinguish the scheme (or schemes) supported by the > >> >>> originating Flight RPC service? > >> >>> > >> >>> If that is the case, it may be preferred to use the "host" portion > of the > >> >>> URI rather than the "scheme" to denote the location of the data. In > this > >> >>> scenario, the host "0.0.0.0" could be used. This IP address is > defined in > >> >>> IETF RFC1122 [1] as "This host on this network", which seems most > >> >>> consistent with the intended use-case. There are some caveats to > this usage > >> >>> but in my experience it's not uncommon for protocols to extend the > >> >>> definition of this address in their own usage. > >> >>> > >> >>> A benefit of this convention is that the scheme remains available > in the > >> >>> URI to specify the transport available. For example, the following > list of > >> >>> locations may be included in the response: > >> >>> > >> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4", > <other_locations>...] > >> >>> > >> >>> This would indicate that grpc and ucx transport is available from > the > >> >>> current service, grpc is available at 1.2.3.4, and possibly more > >> >>> combinations of scheme/host. > >> >>> > >> >>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3 > >> >>> > >> >>> On Mon, Feb 12, 2024 at 2:53 PM David Li <lidav...@apache.org> > wrote: > >> >>> > >> >>>> Ah, while I was thinking of it as useful for a fallback, I'm not > >> >>>> specifying it that way. Better ideas for names would be > appreciated. > >> >>>> > >> >>>> The actual precedence has never been specified. All endpoints are > >> >>>> equivalent, so clients may use what is "best". For instance, with > Matt > >> >>>> Topol's concurrent proposal, a GPU-enabled client may > preferentially try > >> >>>> UCX endpoints while other clients may choose to ignore them > entirely (e.g. > >> >>>> because they don't have UCX installed). > >> >>>> > >> >>>> In practice the ADBC/JDBC drivers just scan the list left to right > and try > >> >>>> each endpoint in turn for lack of a better heuristic. > >> >>>> > >> >>>> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote: > >> >>>>> Thanks for proposing this David. > >> >>>>> > >> >>>>> I think the ability to include the Flight RPC service itself in > the list > >> >>>> of > >> >>>>> endpoints from which data can be fetched is a helpful addition. > >> >>>>> > >> >>>>> The current choice of name for the URI (arrow-flight-fallback://) > seems > >> >>>> to > >> >>>>> imply that there is an order of precedence that should be > considered in > >> >>>> the > >> >>>>> list of URI’s. Specifically, as a developer receiving the list of > >> >>>> locations > >> >>>>> I might assume that I should try fetching from other locations > first. If > >> >>>>> those do not succeed, I may try the original service as a > fallback. > >> >>>>> > >> >>>>> Are these the intended semantics? If so, is there a way to > include the > >> >>>>> original service in the list of locations without the implied > precedence? > >> >>>>> > >> >>>>> Thanks, > >> >>>>> Joel > >> >>>>> > >> >>>>> On Mon, Feb 12, 2024 at 11:52 James Duong < > james.du...@improving.com > >> >>>> .invalid> > >> >>>>> wrote: > >> >>>>> > >> >>>>>> This seems like a good idea, and also improves consistency with > clients > >> >>>>>> that erroneously assumed that the service endpoint was always in > the > >> >>>> list > >> >>>>>> of endpoints. > >> >>>>>> > >> >>>>>> From: Antoine Pitrou <anto...@python.org> > >> >>>>>> Date: Monday, February 12, 2024 at 6:05 AM > >> >>>>>> To: dev@arrow.apache.org <dev@arrow.apache.org> > >> >>>>>> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme > >> >>>>>> > >> >>>>>> Hello, > >> >>>>>> > >> >>>>>> This looks fine to me. > >> >>>>>> > >> >>>>>> Regards > >> >>>>>> > >> >>>>>> Antoine. > >> >>>>>> > >> >>>>>> > >> >>>>>> Le 12/02/2024 à 14:46, David Li a écrit : > >> >>>>>>> Hello, > >> >>>>>>> > >> >>>>>>> I'd like to propose a slight update to Flight RPC to make > Flight SQL > >> >>>>>> work better in different deployment scenarios. Comments on the > doc > >> >>>> would > >> >>>>>> be appreciated: > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>> > https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing > >> >>>>>>> > >> >>>>>>> The gist is that FlightEndpoint allows specifying either (1) a > list of > >> >>>>>> concrete URIs to fetch data from or (2) no URIs, meaning to > fetch from > >> >>>> the > >> >>>>>> Flight RPC service itself; but it would be useful to combine both > >> >>>> behaviors > >> >>>>>> (try these concrete URIs and fall back to the Flight RPC service > itself) > >> >>>>>> without requiring the service to know its own public address. > >> >>>>>>> > >> >>>>>>> Best, > >> >>>>>>> David > >> >>>>>> > >> >>>> >