Re: [I] Proposal to Introduce Ray SQL into DataFusion Python [datafusion-python]

via GitHub Wed, 18 Sep 2024 04:12:48 -0700


franklsf95 commented on issue #872:
URL: 
https://github.com/apache/datafusion-python/issues/872#issuecomment-2358185385


   Really excited to see this happening! I contributed some code to Ray SQL 
last year (most notably, making distributed shuffle work using Ray's 
distributed object store), and can help answer any question regarding Ray (I 
work with people who build Ray, Ray Data, etc.)
   
   On a high level, by building on top of Ray, you get a distributed execution 
substrate for free. Ray handles managing the cluster, scheduling tasks, 
managing distributed memory, fault tolerance, to name a few. This would mean 
basically to use DataFusion as a single-node query execution engine and build 
all the distributed stuff in Python on top of Ray. If this is in line with the 
goal of this project (or the DataFusion project), then I think it would be a 
good way to go.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Proposal to Introduce Ray SQL into DataFusion Python [datafusion-python]

Reply via email to