Re: docker image distribution in Kubernetes cluster

Rob Vesse Wed, 08 Dec 2021 10:20:50 -0800

So the point Khalid was trying to make is that there are legitimate reasons you 
might use different container images for the driver pod vs the executor pod.  
It has nothing to do with Docker versions.


 

Since the bulk of the actual work happens on the executor you may want 
additional libraries, tools or software in that image that your job code can 
call.  This same software may be entirely unnecessary on the driver allowing 
you to use a smaller image for that versus the executor image.

 

As a practical example for a ML use case you might want to have the optional 
Intel MKL or OpenBLAS dependencies which can significantly bloat the size of 
your container image (by hundreds of megabytes) and would only be needed by the 
executor pods.

 

Rob

 

From: Mich Talebzadeh <mich.talebza...@gmail.com>
Date: Wednesday, 8 December 2021 at 17:42
To: Khalid Mammadov <khalidmammad...@gmail.com>
Cc: "user @spark" <u...@spark.apache.org>, Spark dev list <dev@spark.apache.org>
Subject: Re: docker image distribution in Kubernetes cluster

 

Thanks Khalid for your notes
 

I have not come across a use case where the docker version on the driver and 
executors need to be different.

 

My thinking is that spark.kubernetes.executor.container.image is the correct 
reference as in the Kubernetes where container is the correct terminology and 
also both driver and executors are spark specific.

 

cheers

 

 

   view my Linkedin profile

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Wed, 8 Dec 2021 at 11:47, Khalid Mammadov <khalidmammad...@gmail.com> wrote:

Hi Mitch

 

IMO, it's done to provide most flexibility. So, some users can have 
limited/restricted version of the image or with some additional software that 
they use on the executors that is used during processing. 

 

So, in your case you only need to provide the first one since the other two 
configs will be copied from it

 

Regards

Khalid

 

On Wed, 8 Dec 2021, 10:41 Mich Talebzadeh, <mich.talebza...@gmail.com> wrote:

Just a correction that in Spark 3.2 documentation it states that 

 

Property NameDefaultMeaning
spark.kubernetes.container.image(none)Container image to use for the Spark 
application. This is usually of the form example.com/repo/spark:v1.0.0. This 
configuration is required and must be provided by the user, unless explicit 
images are provided for each different container type.2.3.0
spark.kubernetes.driver.container.image(value of 
spark.kubernetes.container.image)Custom container image to use for the 
driver.2.3.0
spark.kubernetes.executor.container.image(value of 
spark.kubernetes.container.image)Custom container image to use for executors.
So both driver and executor images are mapped to the container image. In my 
opinion, they are redundant and will potentially add confusion so they should 
be removed?

 

   view my Linkedin profile

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Wed, 8 Dec 2021 at 10:15, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:

Hi,

 

We have three conf parameters to distribute the docker image with spark-sumit 
in Kubernetes cluster.

 

These are

 

spark-submit --verbose \

          --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \

           --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \

           --conf spark.kubernetes.container.image=${IMAGEGCP} \

 

when the above is run, it shows

 

(spark.kubernetes.driver.docker.image,eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages)

(spark.kubernetes.executor.docker.image,eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages)

(spark.kubernetes.container.image,eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages)

 

You notice that I am using the same docker image for driver, executor and 
container. In Spark 3.2 (actually in recent spark versions), I cannot see 
reference to driver or executor. Are these depreciated? It appears that Spark 
still accepts them?

 

Thanks


 

 

   view my Linkedin profile

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 h

Re: docker image distribution in Kubernetes cluster

Reply via email to