Hello,

I'm interested in using Flight for serving large amounts of data in a
parallelised manner, and just building some Python prototypes, based on
https://github.com/apache/arrow/blob/apache-arrow-0.17.1/python/examples/flight

In my use-case, we'd have a bunch of worker servers, serving a number of
different datasets (here called "datasetA" and "datasetB"), but also some
additional parameters to customise a single query (eg a date range if the
dataset is a time series, but can be other stuff too - depending on the
dataset).

The idea is for clients to hit a single coordinator with their entire query
(eg datasetA + [1970,2020]), and then getting instructed to hit a variety
of workers, with slices of this, e.g. {worker1: (datasetA, [1970, 1990)),
worker2: (datasetA, [1990-2020])}. I.e. I want to chunk up the original
request in a few smaller ones, to be handled by different workers, which
then retrieve the data from a DB and send it back to the client, which
aggregates.

Although I'm proto-typing from Python, this should work from a variety of
platforms.
Does that sound like something Flight should be able to do well?

If so - what are the intended semantics for the descriptor and ticket etc,
based on my previous example? I see idioms for "path" and "cmd" etc, but
neither really seems to fit. My query is more like some opaque JSON, e.g.
something you'd submit to an HTTP server. Is the idea to send a
string-serialisation of e.g:

{
  "dataset": "datasetA",
  "dateFrom": "1970-01-01",
  "dateTo": "2020-06-23"
}?

In that case, what should listFlights return, given that the queries are
dynamic? Something like,
["datasetA", "datasetB", ...] ?

I guess I'm mainly struggling to understand what a descriptor, ticket and
flight really are, within my context - and can't really find it in the
docs.
Just a link to some good docs would obviously be great as well! I'm hitting
https://arrow.apache.org/docs/python/api/flight.html which is  largely
empty. It does say "Flight is currently not distributed as part of wheels
or in Conda - it is only available when built from source appropriately."
which seems a bit pessimistic, as it appears present in both the pypi and
conda 0.17.1 package I checked.

Cheers,
-Joris.

Reply via email to