Thanks Andy! Very helpful. You have hit on one of the questions that we've been wrestling with: which tools would consume Drill data as Arrow? More generally, what are the use cases for Arrow data interchange?
Flight makes sense for transferring large data sets, such as in exchanges within a distributed engine, or from a "data service" such as a hypothetical Flight-based S3 Select. Flight (and Arrow in general) seems less useful as a client API for things like BI tools, dashboards and the like; xDBC seems like a better fit since such tools will consume "human-sized" result sets. The article in your link notes that there is a Spark consumer for Flight. Drill's use case would likely be similar -- both tools could consume large data sets from Flight-enabled sources. As for Drill as a producer, one could conjure an example in which Spark reads data from Drill. Maybe Drill runs a number of complex SQL queries to produce data sets upon which Spark runs some ML tasks. Drill is probably a better tool to run the kind of monster SQL statements that business analysts like to create, but Spark is better for the kind of algorithmic processing typical of ML. (One could argue, with Flight, you get the best of both worlds. Charles, we need your insight here.) Perhaps Flight's creators have similar scenarios in mind. More practically, between the example flight server you mentioned (as a producer) and Spark (as a consumer), we have what we need if someone wants to create the prototypes we mentioned. Or, if someone wants to get very meta, we can have Drill using Flight to read from another Drill. Not sure it's useful, but would be a cool demo. Thanks, - Paul On Monday, January 13, 2020, 04:21:29 PM PST, Andy Grove <andygrov...@gmail.com> wrote: Hi Paul, There is a test flight server in the Arrow Java project [1] that might be a good starting point, although I haven't used it myself. I was looking at Arrow Flight for my Ballista Poc [2] although I don't really have time to spend on that right now. I'm less sure of the value of having an Arrow consumer for Drill since any vectorized processing would already have been performed by Drill? I may be missing something though. Thanks, Andy. [1] https://github.com/apache/arrow/tree/master/java/flight/flight-core#example-usage [2] https://github.com/andygrove/ballista