Re: docker image distribution in Kubernetes cluster

Prasad Paravatha Wed, 08 Dec 2021 21:59:11 -0800

I agree with Khalid and Rob. We absolutely need different properties for
Driver and Executor images for ML use-cases.


Here is a real-world example of setup at our company

   - Default setup via configmaps: When our Data scientists request Spark
   on k8s clusters (they are not familiar with Docker or k8s), we inject spark
   default Driver/Executor images (and whole lot of other default properties)
   - Our ML Engineers frequently build new Driver and Executor images to
   include new experimental ML libraries/packages, test and release to the
   wider Data scientist community.

Regards,
Prasad

On Thu, Dec 9, 2021 at 12:25 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
> Fine. If I go back to the list itself
>
>
> Property NameDefaultMeaning
> spark.kubernetes.container.image (none) Container image to use for the
> Spark application. This is usually of the form
> example.com/repo/spark:v1.0.0. This configuration is required and must be
> provided by the user, unless explicit images are provided for each
> different container type. 2.3.0
> spark.kubernetes.driver.container.image (value of
> spark.kubernetes.container.image) Custom container image to use for the
> driver. 2.3.0
> spark.kubernetes.executor.container.image (value of
> spark.kubernetes.container.image) Custom container image to use for
> executors.
>
> If I specify* both* the driver and executor images, then there is no need
> for a generic container type image,  it will be ignored.  So either one
> specifies the driver AND executor images explicitly and excludes the
> container image or
>
> specifies one of the driver *or* container images explicitly and then it
> has to set the container image as well for the default to work. A bit of a
> long shot.
>
>
> cheers
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 8 Dec 2021 at 18:21, Rob Vesse <rve...@dotnetrdf.org> wrote:
>
>> So the point Khalid was trying to make is that there are legitimate
>> reasons you might use different container images for the driver pod vs the
>> executor pod.  It has nothing to do with Docker versions.
>>
>>
>>
>> Since the bulk of the actual work happens on the executor you may want
>> additional libraries, tools or software in that image that your job code
>> can call.  This same software may be entirely unnecessary on the driver
>> allowing you to use a smaller image for that versus the executor image.
>>
>>
>>
>> As a practical example for a ML use case you might want to have the
>> optional Intel MKL or OpenBLAS dependencies which can significantly bloat
>> the size of your container image (by hundreds of megabytes) and would only
>> be needed by the executor pods.
>>
>>
>>
>> Rob
>>
>>
>>
>> *From: *Mich Talebzadeh <mich.talebza...@gmail.com>
>> *Date: *Wednesday, 8 December 2021 at 17:42
>> *To: *Khalid Mammadov <khalidmammad...@gmail.com>
>> *Cc: *"user @spark" <u...@spark.apache.org>, Spark dev list <
>> dev@spark.apache.org>
>> *Subject: *Re: docker image distribution in Kubernetes cluster
>>
>>
>>
>> Thanks Khalid for your notes
>>
>>
>>
>> I have not come across a use case where the docker version on the driver
>> and executors need to be different.
>>
>>
>>
>> My thinking is that spark.kubernetes.executor.container.image is the
>> correct reference as in the Kubernetes where container is the correct
>> terminology and also both driver and executors are spark specific.
>>
>>
>>
>> cheers
>>
>>
>>
>>
>>
>>  [image: Image removed by sender.]  view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>>
>>
>> On Wed, 8 Dec 2021 at 11:47, Khalid Mammadov <khalidmammad...@gmail.com>
>> wrote:
>>
>> Hi Mitch
>>
>>
>>
>> IMO, it's done to provide most flexibility. So, some users can have
>> limited/restricted version of the image or with some additional software
>> that they use on the executors that is used during processing.
>>
>>
>>
>> So, in your case you only need to provide the first one since the other
>> two configs will be copied from it
>>
>>
>>
>> Regards
>>
>> Khalid
>>
>>
>>
>> On Wed, 8 Dec 2021, 10:41 Mich Talebzadeh, <mich.talebza...@gmail.com>
>> wrote:
>>
>> Just a correction that in Spark 3.2 documentation it states
>> <https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration>
>> that
>>
>>
>>
>> *Property Name*
>>
>> *Default*
>>
>> *Meaning*
>>
>> spark.kubernetes.container.image
>>
>> (none)
>>
>> Container image to use for the Spark application. This is usually of the
>> form example.com/repo/spark:v1.0.0. This configuration is required and
>> must be provided by the user, unless explicit images are provided for each
>> different container type.
>>
>> 2.3.0
>>
>> spark.kubernetes.driver.container.image
>>
>> (value of spark.kubernetes.container.image)
>>
>> Custom container image to use for the driver.
>>
>> 2.3.0
>>
>> spark.kubernetes.executor.container.image
>>
>> (value of spark.kubernetes.container.image)
>>
>> Custom container image to use for executors.
>>
>> So both driver and executor images are mapped to the container image. In
>> my opinion, they are redundant and will potentially add confusion so they
>> should be removed?
>>
>>
>>
>>  [image: Image removed by sender.]  view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>>
>>
>> On Wed, 8 Dec 2021 at 10:15, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> We have three conf parameters to distribute the docker image with
>> spark-sumit in Kubernetes cluster.
>>
>>
>>
>> These are
>>
>>
>>
>> spark-submit --verbose \
>>
>>           --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \
>>
>>            --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \
>>
>>            --conf spark.kubernetes.container.image=${IMAGEGCP} \
>>
>>
>>
>> when the above is run, it shows
>>
>>
>>
>> (spark.kubernetes.driver.docker.image,
>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>> )
>>
>> (spark.kubernetes.executor.docker.image,
>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>> )
>>
>> (spark.kubernetes.container.image,
>> eu.gcr.io/axial-glow-224522/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-addedpackages
>> )
>>
>>
>>
>> You notice that I am using the same docker image for driver, executor and
>> container. In Spark 3.2 (actually in recent spark versions), I cannot see
>> reference to driver or executor. Are these depreciated? It appears that
>> Spark still accepts them?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>  [image: Image removed by sender.]  view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>  h
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

-- 
Regards,
Prasad Paravatha

Re: docker image distribution in Kubernetes cluster

Reply via email to