> (Correct me if I'm wrong Matt, but as I recall, UCX addresses aren't
hostnames but rather opaque byte blobs, for instance.)

You can use a hostname and port to create a ucx connection, but there is
separately an address object. A UCX address object is an opaque byte blob
that includes a whole mess of available transport and config information.
Using this instead of the host/port string allows skipping that config
exchange during connection basically.

I'd be okay with the + convention as an idea.

On Mon, Feb 12, 2024, 5:40 PM David Li <lidav...@apache.org> wrote:

> The idea is that the client would reuse the existing connection, in which
> case the protocol and such are implicit. (If the client doesn't have a
> connection anymore, it can't use the fallback anyways.)
>
> I suppose this has the advantage that you could "fall back" to a known
> hostname with a different protocol, but I'm not sure that always applies
> anyways. (Correct me if I'm wrong Matt, but as I recall, UCX addresses
> aren't hostnames but rather opaque byte blobs, for instance.)
>
> If we do prefer this, to avoid overloading the hostname, there's also the
> informal convention of using + in the scheme, so it could be
> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>
> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
> > Thanks for clarifying.
> >
> > Given the relationship between these two proposals, would it also be
> > necessary to distinguish the scheme (or schemes) supported by the
> > originating Flight RPC service?
> >
> > If that is the case, it may be preferred to use the "host" portion of the
> > URI rather than the "scheme" to denote the location of the data. In this
> > scenario, the host "0.0.0.0" could be used. This IP address is defined in
> > IETF RFC1122 [1] as "This host on this network", which seems most
> > consistent with the intended use-case. There are some caveats to this
> usage
> > but in my experience it's not uncommon for protocols to extend the
> > definition of this address in their own usage.
> >
> > A benefit of this convention is that the scheme remains available in the
> > URI to specify the transport available. For example, the following list
> of
> > locations may be included in the response:
> >
> > ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4",
> <other_locations>...]
> >
> > This would indicate that grpc and ucx transport is available from the
> > current service, grpc is available at 1.2.3.4, and possibly more
> > combinations of scheme/host.
> >
> > [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
> >
> > On Mon, Feb 12, 2024 at 2:53 PM David Li <lidav...@apache.org> wrote:
> >
> >> Ah, while I was thinking of it as useful for a fallback, I'm not
> >> specifying it that way.  Better ideas for names would be appreciated.
> >>
> >> The actual precedence has never been specified. All endpoints are
> >> equivalent, so clients may use what is "best". For instance, with Matt
> >> Topol's concurrent proposal, a GPU-enabled client may preferentially try
> >> UCX endpoints while other clients may choose to ignore them entirely
> (e.g.
> >> because they don't have UCX installed).
> >>
> >> In practice the ADBC/JDBC drivers just scan the list left to right and
> try
> >> each endpoint in turn for lack of a better heuristic.
> >>
> >> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
> >> > Thanks for proposing this David.
> >> >
> >> > I think the ability to include the Flight RPC service itself in the
> list
> >> of
> >> > endpoints from which data can be fetched is a helpful addition.
> >> >
> >> > The current choice of name for the URI (arrow-flight-fallback://)
> seems
> >> to
> >> > imply that there is an order of precedence that should be considered
> in
> >> the
> >> > list of URI’s. Specifically, as a developer receiving the list of
> >> locations
> >> > I might assume that I should try fetching from other locations first.
> If
> >> > those do not succeed, I may try the original service as a fallback.
> >> >
> >> > Are these the intended semantics? If so, is there a way to include the
> >> > original service in the list of locations without the implied
> precedence?
> >> >
> >> > Thanks,
> >> > Joel
> >> >
> >> > On Mon, Feb 12, 2024 at 11:52 James Duong <james.du...@improving.com
> >> .invalid>
> >> > wrote:
> >> >
> >> >> This seems like a good idea, and also improves consistency with
> clients
> >> >> that erroneously assumed that the service endpoint was always in the
> >> list
> >> >> of endpoints.
> >> >>
> >> >> From: Antoine Pitrou <anto...@python.org>
> >> >> Date: Monday, February 12, 2024 at 6:05 AM
> >> >> To: dev@arrow.apache.org <dev@arrow.apache.org>
> >> >> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme
> >> >>
> >> >> Hello,
> >> >>
> >> >> This looks fine to me.
> >> >>
> >> >> Regards
> >> >>
> >> >> Antoine.
> >> >>
> >> >>
> >> >> Le 12/02/2024 à 14:46, David Li a écrit :
> >> >> > Hello,
> >> >> >
> >> >> > I'd like to propose a slight update to Flight RPC to make Flight
> SQL
> >> >> work better in different deployment scenarios.  Comments on the doc
> >> would
> >> >> be appreciated:
> >> >> >
> >> >> >
> >> >>
> >>
> https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing
> >> >> >
> >> >> > The gist is that FlightEndpoint allows specifying either (1) a
> list of
> >> >> concrete URIs to fetch data from or (2) no URIs, meaning to fetch
> from
> >> the
> >> >> Flight RPC service itself; but it would be useful to combine both
> >> behaviors
> >> >> (try these concrete URIs and fall back to the Flight RPC service
> itself)
> >> >> without requiring the service to know its own public address.
> >> >> >
> >> >> > Best,
> >> >> > David
> >> >>
> >>
>

Reply via email to