Re: Extending Spark with a custom ExternalClusterManager

Dejan Pejchev Fri, 07 Feb 2025 01:37:23 -0800

Hi Mich,

Yes, the project is fully open-source and adopted by enterprises who do
very large scale batch scheduling and data processing.


The GitHub repository is https://github.com/armadaproject/armada and the
Armada Operator is the simplest way to install it
https://github.com/armadaproject/armada-operator

Kind regards

On Fri, Feb 7, 2025 at 2:33 AM Mich Talebzadeh <[email protected]>
wrote:

> Hi,
>
> Is this the correct link to this open source product?
>
> Armada - how to run millions of batch jobs over thousands of compute nodes
> using Kubernetes | G-Research
> <https://www.gresearch.com/news/armada-how-to-run-millions-of-batch-jobs-over-thousands-of-compute-nodes-using-kubernetes/>
>
> I am familiar with some of your work in G-Research
>
> HTH
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>
>
> On Thu, 6 Feb 2025 at 23:40, Dejan Pejchev <[email protected]> wrote:
>
>> Hello Spark community!
>>
>> My name is Dejan Pejchev, and I am a Software Engineer working at
>> G-Research, and I am a maintainer of our Kubernetes multi-cluster batch
>> scheduler called Armada.
>>
>> We are trying to build an integration with Spark, where we would like to
>> use the spark-submit with a master armada://xxxx, which will then submit
>> the driver and executor jobs to Armada.
>>
>> I understood the concept of the ExternalClusterManager and how I can
>> write and provide a new implementation, but I am not clear how can I extend
>> Spark to accept it.
>>
>> I see that in SparkSubmit.scala there is a check for master URLs and it
>> fails if it isn't any of local, mesos, k8s and yarn.
>>
>> What is the correct approach for my use case?
>>
>> Thanks in advance,
>> Dejan Pejchev
>>
>

Re: Extending Spark with a custom ExternalClusterManager

Reply via email to