adriangb commented on issue #173: URL: https://github.com/apache/arrow-ballista/issues/173#issuecomment-1396138349
> Where would the HTTP server be hosted? Scheduler? Single Executor process? Multiple Executor processes? An entirely new process? I'd think it'd be very similar to an executor process (a UDF executor process?). But I'm not super familiar with how executors run now, how they scale, etc. > Wouldn't it make more sense to have that communication channel be Arrow flight instead of HTTP? With this example above I'd say it can be Arrow flight or any other communication protocol, I'm not constraining it to HTTP. It would be really cool though if as a user I could deploy one thing that can talk over Arrow Flight for use as a UDF but also serve an HTTP API. > Seems like this would introduce a large performance hit having to make an external (data movement) remote invocation for each RecordBatch since the data would need to be moved to the HTTP UDF service running on another host. Yeah this I don't know about. Would the data be currently residing in the Executor? If so the only way to do everything in memory would be to run these UDFs on the Executor itself. Which gets into the complication of dependency management. Could users wrap the executor itself? Maybe when `UDFExecutorApp` registers itself with the scheduler it _becomes_ an executor which execute regular queries but also the UDFs that it is running locally? It might get confusing if different executors have different UDFs available... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
