Well you can try using Environment variable and  create a custom script
that modifies the --master URL before invoking spark-submit. This script
could replace "k8s://" with another identifier of your choice
"k8s-armada://") and then modify the SparkSubmit code to handle this custom
URL scheme. This may bypass the internal logic within SparkSubmit that
restricts --deploy-mode cluster with "k8s://" URLs.

export SPARK_MASTER_URL="k8s://https://$KUBERNETES_MASTER_IP:443";

spark-submit-Armada --verbose \
 --properties-file ${property_file} \
 --deploy-mode cluster \
 --name sparkArmada

then modify or copy Spark-Submit code to Spark-Submit-Armanda to handle
this custom URL for now for test/debugging purposes

HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Fri, 7 Feb 2025 at 14:55, Dejan Pejchev <de...@gr-oss.io> wrote:

> Thanks for the reply Mich!
>
> Good point, the issue is that cluster deploy mode is not possible
> when master is local (
> https://github.com/apache/spark/blob/9cf98ed41b2de1b44c44f0b4d1273d46761459fe/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L308
> ).
> Only way to workaround this scenario would be to edit the SparkSubmit,
> which we are trying to avoid because we don't want to touch Spark codebase.
>
> Do you have an idea how to run in cluster deploy mode and load an external
> cluster manager?
>
> Could it be possible to submit a PR for a change in SparkSubmit?
>
> Looking forward to your answer!
>
> On Fri, Feb 7, 2025 at 3:45 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> well that should work but some consideration
>>
>> When you use
>>
>>          spark-submit --verbose \
>>            --properties-file ${property_file} \
>>            --master k8s://https://$KUBERNETES_MASTER_IP:443 \
>> *           --deploy-mode client \*
>>            --name sparkBQ \
>>
>> *--deploy-mode client *that implies the driver runs on the client
>> machine (the machine from which the spark-submit command is executed).
>> Normally deployed for debugging and small clusters.
>> *--deploy-mode cluster *the driver, which is responsible for
>> coordinating the execution of the Spark application, runs *within the
>> Kubernetes cluster *as a separate container.
>>
>> which provides better resource isolation and is more suitable for this
>> type of cluster you are using Armada
>>
>> Anyway you can see how it progresses in debugging mode.
>>
>> HTH
>>
>> Dr Mich Talebzadeh,
>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>>
>>
>> On Fri, 7 Feb 2025 at 14:01, Dejan Pejchev <de...@gr-oss.io> wrote:
>>
>>> I got it to work by running it in client mode and using the `local://*`
>>> prefix. My external cluster manager gets injected just fine.
>>>
>>> On Fri, Feb 7, 2025 at 12:38 AM Dejan Pejchev <de...@gr-oss.io> wrote:
>>>
>>>> Hello Spark community!
>>>>
>>>> My name is Dejan Pejchev, and I am a Software Engineer working at
>>>> G-Research, and I am a maintainer of our Kubernetes multi-cluster batch
>>>> scheduler called Armada.
>>>>
>>>> We are trying to build an integration with Spark, where we would like
>>>> to use the spark-submit with a master armada://xxxx, which will then submit
>>>> the driver and executor jobs to Armada.
>>>>
>>>> I understood the concept of the ExternalClusterManager and how I can
>>>> write and provide a new implementation, but I am not clear how can I extend
>>>> Spark to accept it.
>>>>
>>>> I see that in SparkSubmit.scala there is a check for master URLs and it
>>>> fails if it isn't any of local, mesos, k8s and yarn.
>>>>
>>>> What is the correct approach for my use case?
>>>>
>>>> Thanks in advance,
>>>> Dejan Pejchev
>>>>
>>>

Reply via email to