aditanase commented on issue #1103:
URL: 
https://github.com/apache/datafusion-python/issues/1103#issuecomment-2829650344

   @tespent I am very intrigued by how you're using Datafusion and ray.data 
together - if you can share, I'd like to learn more about the interplay of the 
2 systems.
   
   We are also exploring this space, started with datafusion-ray then decided 
to write a simple ray.Datasource that wraps datafusion for distributed 
execution of simple queries:
   - plan the SQL using DF
   - repartition and serialize physical plan
   - execute each partition remotely
   - feed into ray.data.map_batches from here on
   
   This has the advantage of DF's load and processing speed, but once we get to 
joins/shuffles, we switch back to ray. We keep going back and forth between 
this and datafusion-ray but haven't settled on one  approach yet, might need to 
build our own in the end.
   
   Are you using a simillar approach? Are you integrating the DF SQL and 
ray.data at a deeper level? More like smallpond?
   
   Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to