My initial inclination is towards #3 but I'd be curious what others think. In the case of #3, I wonder if it makes sense to then pull the Schema off the GetFlightInfo response...
On Fri, Jun 28, 2019 at 10:57 AM Ryan Murray <rym...@dremio.com> wrote: > Hi All, > > I have been working on building an arrow flight source for spark. The goal > here is for Spark to be able to use a group of arrow flight endpoints to > get a dataset pulled over to spark in parallel. > > I am unsure of the best model for the spark <-> flight conversation and > wanted to get your opinion on the best way to go. > > I am breaking up the query to flight from spark into 3 parts: > 1) get the schema using GetFlightInfo. This is needed to do further lazy > operations in Spark > 2) get the endpoints by calling GetFlightInfo a 2nd time with a different > argument. This returns the list endpoints on the parallel flight server. > The endpoints are not available till data is ready to be fetched, which is > done after the schema but is needed before DoGet is called. > 3) call get stream on all endpoints from 2 > > I think I have to do each step however I don't like having to call getInfo > twice, it doesn't seem very elegant. I see a few options: > 1) live with calling GetFlightInfo twice and with a custom bytes cmd to > differentiate the purpose of each call > 2) add an argument to GetFlightInfo to tell it its being called only for > the schema > 3) add another rpc endpoint: ie GetSchema(FlightDescriptor) to return just > the Schema in question > 4) use DoAction and wrap the expected FlightInfo in a Result > > I am aware that 4 is probably the least disruptive but I'm also not a fan > as (to me) it implies performing an action on the server side. Suggestions > 2 & 3 are larger changes and I am reluctant to do that unless there is a > consensus here. None of them are great options and I am wondering what > everyone thinks the best approach might be? Particularly as I think this is > likely to come up in more applications than just spark. > > Best, > Ryan >