Hi Mich, Yes, the project is fully open-source and adopted by enterprises who do very large scale batch scheduling and data processing.
The GitHub repository is https://github.com/armadaproject/armada and the Armada Operator is the simplest way to install it https://github.com/armadaproject/armada-operator Kind regards On Fri, Feb 7, 2025 at 2:33 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > Is this the correct link to this open source product? > > Armada - how to run millions of batch jobs over thousands of compute nodes > using Kubernetes | G-Research > <https://www.gresearch.com/news/armada-how-to-run-millions-of-batch-jobs-over-thousands-of-compute-nodes-using-kubernetes/> > > I am familiar with some of your work in G-Research > > HTH > > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > On Thu, 6 Feb 2025 at 23:40, Dejan Pejchev <de...@gr-oss.io> wrote: > >> Hello Spark community! >> >> My name is Dejan Pejchev, and I am a Software Engineer working at >> G-Research, and I am a maintainer of our Kubernetes multi-cluster batch >> scheduler called Armada. >> >> We are trying to build an integration with Spark, where we would like to >> use the spark-submit with a master armada://xxxx, which will then submit >> the driver and executor jobs to Armada. >> >> I understood the concept of the ExternalClusterManager and how I can >> write and provide a new implementation, but I am not clear how can I extend >> Spark to accept it. >> >> I see that in SparkSubmit.scala there is a check for master URLs and it >> fails if it isn't any of local, mesos, k8s and yarn. >> >> What is the correct approach for my use case? >> >> Thanks in advance, >> Dejan Pejchev >> >