franklsf95 commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2358185385
Really excited to see this happening! I contributed some code to Ray SQL last year (most notably, making distributed shuffle work using Ray's distributed object store), and can help answer any question regarding Ray (I work with people who build Ray, Ray Data, etc.) On a high level, by building on top of Ray, you get a distributed execution substrate for free. Ray handles managing the cluster, scheduling tasks, managing distributed memory, fault tolerance, to name a few. This would mean basically to use DataFusion as a single-node query execution engine and build all the distributed stuff in Python on top of Ray. If this is in line with the goal of this project (or the DataFusion project), then I think it would be a good way to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
