Thanks Andy!

Very helpful. You have hit on one of the questions that we've been wrestling 
with: which tools would consume Drill data as Arrow? More generally, what are 
the use cases for Arrow data interchange?

Flight makes sense for transferring large data sets, such as in exchanges 
within a distributed engine, or from a "data service" such as a hypothetical 
Flight-based S3 Select. Flight (and Arrow in general) seems less useful as a 
client API for things like BI tools, dashboards and the like; xDBC seems like a 
better fit since such tools will consume "human-sized" result sets.

The article in your link notes that there is a Spark consumer for Flight. 
Drill's use case would likely be similar -- both tools could consume large data 
sets from Flight-enabled sources.

As for Drill as a producer, one could conjure an example in which Spark reads 
data from Drill. Maybe Drill runs a number of complex SQL queries to produce 
data sets upon which Spark runs some ML tasks. Drill is probably a better tool 
to run the kind of monster SQL statements that business analysts like to 
create, but Spark is better for the kind of algorithmic processing typical of 
ML. (One could argue, with Flight, you get the best of both worlds. Charles, we 
need your insight here.) Perhaps Flight's creators have similar scenarios in 
mind.

More practically, between the example flight server you mentioned (as a 
producer) and Spark (as a consumer), we have what we need if someone wants to 
create the prototypes we mentioned.

Or, if someone wants to get very meta, we can have Drill using Flight to read 
from another Drill. Not sure it's useful, but would be a cool demo.

Thanks,

- Paul

 

    On Monday, January 13, 2020, 04:21:29 PM PST, Andy Grove 
<andygrov...@gmail.com> wrote:  
 
 Hi Paul,

There is a test flight server in the Arrow Java project [1] that might be a
good starting point, although I haven't used it myself. I was looking at
Arrow Flight for my Ballista Poc [2] although I don't really have time to
spend on that right now.

I'm less sure of the value of having an Arrow consumer for Drill since any
vectorized processing would already have been performed by Drill? I may be
missing something though.

Thanks,

Andy.

[1]
https://github.com/apache/arrow/tree/master/java/flight/flight-core#example-usage
[2] https://github.com/andygrove/ballista


  

Reply via email to