Thanks Prasad.

My understanding to what you are implying is that we can have multiple
docker images to use for different use cases

gcloud container images list-tags eu.gcr.io/<PROJECT_ID>/spark-py
e2e71387c295  3.1.1-scala_2.12-8-jre-slim-buster-java8WithPyyaml
2021-12-08T22:56:17
d0bcc195a35f  3.1.2-scala_2.12-8-jre-slim-buster-addedpackages
2021-08-27T20:43:11
229e03971f73  3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
2021-08-22T17:23:50

So the spark-submit can utilise either of these in


           --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \

           --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \


Note that in this case both the driver and executors will use the same
image and that ${IMAGEGCP}  can be set to whatever is in the repository.
Now the points made by previous comments implied that the drive could have
the basic package identified here say with
3.1.1-scala_2.12-8-jre-slim-buster-java8WithPyyaml and the executors will
have 3.1.1-scala_2.12-8-jre-slim-buster-addedpackages with the additional
packages.


Cheers


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 9 Dec 2021 at 05:59, Prasad Paravatha <prasad.parava...@gmail.com>
wrote:

> I agree with Khalid and Rob. We absolutely need different properties for
> Driver and Executor images for ML use-cases.
>
> Here is a real-world example of setup at our company
>
>    - Default setup via configmaps: When our Data scientists request Spark
>    on k8s clusters (they are not familiar with Docker or k8s), we inject spark
>    default Driver/Executor images (and whole lot of other default properties)
>    - Our ML Engineers frequently build new Driver and Executor images to
>    include new experimental ML libraries/packages, test and release to the
>    wider Data scientist community.
>
> Regards,
> Prasad
>
> On Thu, Dec 9, 2021 at 12:25 AM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>>
>> Fine. If I go back to the list itself
>>
>>
>> Property NameDefaultMeaning
>> spark.kubernetes.container.image (none) Container image to use for the
>> Spark application. This is usually of the form
>> example.com/repo/spark:v1.0.0. This configuration is required and must
>> be provided by the user, unless explicit images are provided for each
>> different container type. 2.3.0
>> spark.kubernetes.driver.container.image (value of
>> spark.kubernetes.container.image) Custom container image to use for the
>> driver. 2.3.0
>> spark.kubernetes.executor.container.image (value of
>> spark.kubernetes.container.image) Custom container image to use for
>> executors.
>>
>> If I specify* both* the driver and executor images, then there is no
>> need for a generic container type image,  it will be ignored.  So either
>> one specifies the driver AND executor images explicitly and excludes the
>> container image or
>>
>> specifies one of the driver *or* container images explicitly and then it
>> has to set the container image as well for the default to work. A bit of a
>> long shot.
>>
>>
>> cheers
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 8 Dec 2021 at 18:21, Rob Vesse <rve...@dotnetrdf.org> wrote:
>>
>>> So the point Khalid was trying to make is that there are legitimate
>>> reasons you might use different container images for the driver pod vs the
>>> executor pod.  It has nothing to do with Docker versions.
>>>
>>>
>>>
>>> Since the bulk of the actual work happens on the executor you may want
>>> additional libraries, tools or software in that image that your job code
>>> can call.  This same software may be entirely unnecessary on the driver
>>> allowing you to use a smaller image for that versus the executor image.
>>>
>>>
>>>
>>> As a practical example for a ML use case you might want to have the
>>> optional Intel MKL or OpenBLAS dependencies which can significantly bloat
>>> the size of your container image (by hundreds of megabytes) and would only
>>> be needed by the executor pods.
>>>
>>>
>>>
>>> Rob
>>>
>>>
>>>
>>> *From: *Mich Talebzadeh <mich.talebza...@gmail.com>
>>> *Date: *Wednesday, 8 December 2021 at 17:42
>>> *To: *Khalid Mammadov <khalidmammad...@gmail.com>
>>> *Cc: *"user @spark" <u...@spark.apache.org>, Spark dev list <
>>> dev@spark.apache.org>
>>> *Subject: *Re: docker image distribution in Kubernetes cluster
>>>
>>>
>>>
>>> Thanks Khalid for your notes
>>>
>>>
>>>
>>> I have not come across a use case where the docker version on the driver
>>> and executors need to be different.
>>>
>>>
>>>
>>> My thinking is that spark.kubernetes.executor.container.image is the
>>> correct reference as in the Kubernetes where container is the correct
>>> terminology and also both driver and executors are spark specific.
>>>
>>>
>>>
>>> cheers
>>>
>>>
>>>
>>>
>>>
>>>  [image: Image removed by sender.]  view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 8 Dec 2021 at 11:47, Khalid Mammadov <khalidmammad...@gmail.com>
>>> wrote:
>>>
>>> Hi Mitch
>>>
>>>
>>>
>>> IMO, it's done to provide most flexibility. So, some users can have
>>> limited/restricted version of the image or with some additional software
>>> that they use on the executors that is used during processing.
>>>
>>>
>>>
>>> So, in your case you only need to provide the first one since the other
>>> two configs will be copied from it
>>>
>>>
>>>
>>> Regards
>>>
>>> Khalid
>>>
>>>
>>>
>>> On Wed, 8 Dec 2021, 10:41 Mich Talebzadeh, <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>> Just a correction that in Spark 3.2 documentation it states
>>> <https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration>
>>> that
>>>
>>>
>>>
>>> *Property Name*
>>>
>>> *Default*
>>>
>>> *Meaning*
>>>
>>> spark.kubernetes.container.image
>>>
>>> (none)
>>>
>>> Container image to use for the Spark application. This is usually of the
>>> form example.com/repo/spark:v1.0.0. This configuration is required and
>>> must be provided by the user, unless explicit images are provided for each
>>> different container type.
>>>
>>> 2.3.0
>>>
>>> spark.kubernetes.driver.container.image
>>>
>>> (value of spark.kubernetes.container.image)
>>>
>>> Custom container image to use for the driver.
>>>
>>> 2.3.0
>>>
>>> spark.kubernetes.executor.container.image
>>>
>>> (value of spark.kubernetes.container.image)
>>>
>>> Custom container image to use for executors.
>>>
>>> So both driver and executor images are mapped to the container image. In
>>> my opinion, they are redundant and will potentially add confusion so they
>>> should be removed?
>>>
>>>
>>>
>>>  [image: Image removed by sender.]  view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 8 Dec 2021 at 10:15, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> We have three conf parameters to distribute the docker image with
>>> spark-sumit in Kubernetes cluster.
>>>
>>>
>>>
>>> These are
>>>
>>>
>>>
>>> spark-submit --verbose \
>>>
>>>           --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \
>>>
>>>            --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \
>>>
>>>            --conf spark.kubernetes.container.image=${IMAGEGCP} \
>>>
>>>
>>>
>>> when the above is run, it shows
>>>
>>>
>>>
>>> (spark.kubernetes.driver.docker.image,
>>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>>> )
>>>
>>> (spark.kubernetes.executor.docker.image,
>>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>>> )
>>>
>>> (spark.kubernetes.container.image,
>>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>>> )
>>>
>>>
>>>
>>> You notice that I am using the same docker image for driver, executor
>>> and container. In Spark 3.2 (actually in recent spark versions), I cannot
>>> see reference to driver or executor. Are these depreciated? It appears that
>>> Spark still accepts them?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>  [image: Image removed by sender.]  view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>  h
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> --
> Regards,
> Prasad Paravatha
>

Reply via email to