Hey everyone, I'm one of the maintainers of the Ray project (https://ray.io <https://ray.io/>) and have been investigating an Apache Beam runner for Ray. Our team has seen a lot of interest from users for this and we are happy to invest significant time.
I've implemented a proof of concept Ray-backed SDK worker[0]. Here I'm using the existing Python Fn API Runner and replace multi-threading/multi-processing SDK workers by distributed Ray workers. This works well (it executes workloads on a cluster) but I'm wondering if this can actually scale to large multi-node clusters. Further I have a couple of questions regarding the general worker scheduling model, specifically if we should continue to go with a polling-based worker model or if we can actually replace this with a push-based model where we use Ray-native data processing APIs do schedule the workload. At this point it would be great to get more context from someone who is more familiar with the Beam abstractions and codebase, specifically the Python implementation. I'd love to get on a call to chat more. Thanks and best wishes, Kai [0] https://github.com/krfricke/beam/tree/ray-runner-scratch/sdks/python/apache_beam/runners/ray <https://github.com/krfricke/beam/tree/ray-runner-scratch/sdks/python/apache_beam/runners/ray>
