I agree with Khalid and Rob. We absolutely need different properties for Driver and Executor images for ML use-cases.
Here is a real-world example of setup at our company - Default setup via configmaps: When our Data scientists request Spark on k8s clusters (they are not familiar with Docker or k8s), we inject spark default Driver/Executor images (and whole lot of other default properties) - Our ML Engineers frequently build new Driver and Executor images to include new experimental ML libraries/packages, test and release to the wider Data scientist community. Regards, Prasad On Thu, Dec 9, 2021 at 12:25 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Fine. If I go back to the list itself > > > Property NameDefaultMeaning > spark.kubernetes.container.image (none) Container image to use for the > Spark application. This is usually of the form > example.com/repo/spark:v1.0.0. This configuration is required and must be > provided by the user, unless explicit images are provided for each > different container type. 2.3.0 > spark.kubernetes.driver.container.image (value of > spark.kubernetes.container.image) Custom container image to use for the > driver. 2.3.0 > spark.kubernetes.executor.container.image (value of > spark.kubernetes.container.image) Custom container image to use for > executors. > > If I specify* both* the driver and executor images, then there is no need > for a generic container type image, it will be ignored. So either one > specifies the driver AND executor images explicitly and excludes the > container image or > > specifies one of the driver *or* container images explicitly and then it > has to set the container image as well for the default to work. A bit of a > long shot. > > > cheers > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 8 Dec 2021 at 18:21, Rob Vesse <rve...@dotnetrdf.org> wrote: > >> So the point Khalid was trying to make is that there are legitimate >> reasons you might use different container images for the driver pod vs the >> executor pod. It has nothing to do with Docker versions. >> >> >> >> Since the bulk of the actual work happens on the executor you may want >> additional libraries, tools or software in that image that your job code >> can call. This same software may be entirely unnecessary on the driver >> allowing you to use a smaller image for that versus the executor image. >> >> >> >> As a practical example for a ML use case you might want to have the >> optional Intel MKL or OpenBLAS dependencies which can significantly bloat >> the size of your container image (by hundreds of megabytes) and would only >> be needed by the executor pods. >> >> >> >> Rob >> >> >> >> *From: *Mich Talebzadeh <mich.talebza...@gmail.com> >> *Date: *Wednesday, 8 December 2021 at 17:42 >> *To: *Khalid Mammadov <khalidmammad...@gmail.com> >> *Cc: *"user @spark" <u...@spark.apache.org>, Spark dev list < >> dev@spark.apache.org> >> *Subject: *Re: docker image distribution in Kubernetes cluster >> >> >> >> Thanks Khalid for your notes >> >> >> >> I have not come across a use case where the docker version on the driver >> and executors need to be different. >> >> >> >> My thinking is that spark.kubernetes.executor.container.image is the >> correct reference as in the Kubernetes where container is the correct >> terminology and also both driver and executors are spark specific. >> >> >> >> cheers >> >> >> >> >> >> [image: Image removed by sender.] view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> >> >> >> On Wed, 8 Dec 2021 at 11:47, Khalid Mammadov <khalidmammad...@gmail.com> >> wrote: >> >> Hi Mitch >> >> >> >> IMO, it's done to provide most flexibility. So, some users can have >> limited/restricted version of the image or with some additional software >> that they use on the executors that is used during processing. >> >> >> >> So, in your case you only need to provide the first one since the other >> two configs will be copied from it >> >> >> >> Regards >> >> Khalid >> >> >> >> On Wed, 8 Dec 2021, 10:41 Mich Talebzadeh, <mich.talebza...@gmail.com> >> wrote: >> >> Just a correction that in Spark 3.2 documentation it states >> <https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration> >> that >> >> >> >> *Property Name* >> >> *Default* >> >> *Meaning* >> >> spark.kubernetes.container.image >> >> (none) >> >> Container image to use for the Spark application. This is usually of the >> form example.com/repo/spark:v1.0.0. This configuration is required and >> must be provided by the user, unless explicit images are provided for each >> different container type. >> >> 2.3.0 >> >> spark.kubernetes.driver.container.image >> >> (value of spark.kubernetes.container.image) >> >> Custom container image to use for the driver. >> >> 2.3.0 >> >> spark.kubernetes.executor.container.image >> >> (value of spark.kubernetes.container.image) >> >> Custom container image to use for executors. >> >> So both driver and executor images are mapped to the container image. In >> my opinion, they are redundant and will potentially add confusion so they >> should be removed? >> >> >> >> [image: Image removed by sender.] view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> >> >> >> On Wed, 8 Dec 2021 at 10:15, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >> Hi, >> >> >> >> We have three conf parameters to distribute the docker image with >> spark-sumit in Kubernetes cluster. >> >> >> >> These are >> >> >> >> spark-submit --verbose \ >> >> --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \ >> >> --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \ >> >> --conf spark.kubernetes.container.image=${IMAGEGCP} \ >> >> >> >> when the above is run, it shows >> >> >> >> (spark.kubernetes.driver.docker.image, >> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages >> ) >> >> (spark.kubernetes.executor.docker.image, >> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages >> ) >> >> (spark.kubernetes.container.image, >> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages >> ) >> >> >> >> You notice that I am using the same docker image for driver, executor and >> container. In Spark 3.2 (actually in recent spark versions), I cannot see >> reference to driver or executor. Are these depreciated? It appears that >> Spark still accepts them? >> >> >> >> Thanks >> >> >> >> >> >> >> [image: Image removed by sender.] view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> h >> >> >> >> >> >> >> >> >> >> -- Regards, Prasad Paravatha