aditanase commented on issue #1103: URL: https://github.com/apache/datafusion-python/issues/1103#issuecomment-2829650344
@tespent I am very intrigued by how you're using Datafusion and ray.data together - if you can share, I'd like to learn more about the interplay of the 2 systems. We are also exploring this space, started with datafusion-ray then decided to write a simple ray.Datasource that wraps datafusion for distributed execution of simple queries: - plan the SQL using DF - repartition and serialize physical plan - execute each partition remotely - feed into ray.data.map_batches from here on This has the advantage of DF's load and processing speed, but once we get to joins/shuffles, we switch back to ray. We keep going back and forth between this and datafusion-ray but haven't settled on one approach yet, might need to build our own in the end. Are you using a simillar approach? Are you integrating the DF SQL and ray.data at a deeper level? More like smallpond? Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org