jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879
Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tune. The Rust API is currently nearing the end of the prototype phase and we are looking for real-world usage for the project. You can view the tracking issue: https://github.com/ray-project/ray/issues/20609 and prototype progress: https://github.com/ray-project/ray/pull/21572 I'm quite interested in exploring the use of the Ray for highly-performant and efficient scheduling of tasks for Ballista. Note that one can do locality-aware scheduling with Ray, which can perform well even without randomized data partitioning etc. - thus opening new possibilities for Ballista's performance. A second advantage of Ray is that the API is simple, so we don't need to deal with networking code and protocols which is difficult to maintain. ```rust // This proc macro generates data marshalling, function registration // and internal API calls for the remote function #[ray::remote] fn my_task(..) { .. } fn main() { let obj = T::new(); let id = ray::put::<T>(obj); // put the object into shared memory / object store // This can run on a remote node, // as scheduled by the distributed scheduler let id2 = ray::task(my_task).remote(id); let result = ray::get::<T2>(id2); // get object from shared memory println!("{:?}", result); } ``` Do let me know if anyone is interested in this. I will be happy to chat. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org