Matt Cheah created SPARK-24655:
----------------------------------

             Summary: [K8S] Custom Docker Image Expectations and Documentation
                 Key: SPARK-24655
                 URL: https://issues.apache.org/jira/browse/SPARK-24655
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 2.3.1
            Reporter: Matt Cheah


A common use case we want to support with Kubernetes is the usage of custom 
Docker images. Some examples include:
 * A user builds an application using Gradle or Maven, using Spark as a 
compile-time dependency. The application's jars (both the custom-written jars 
and the dependencies) need to be packaged in a docker image that can be run via 
spark-submit.
 * A user builds a PySpark or R application and desires to include custom 
dependencies
 * A user wants to switch the base image from Alpine to CentOS while using 
either built-in or custom jars

We currently do not document how these custom Docker images are supposed to be 
built, nor do we guarantee stability of these Docker images with various 
spark-submit versions. To illustrate how this can break down, suppose for 
example we decide to change the names of environment variables that denote the 
driver/executor extra JVM options specified by 
{{spark.[driver|executor].extraJavaOptions}}. If we change the environment 
variable spark-submit provides then the user must update their custom 
Dockerfile and build new images.

Rather than jumping to an implementation immediately though, it's worth taking 
a step back and considering these matters from the perspective of the end user. 
Towards that end, this ticket will serve as a forum where we can answer at 
least the following questions, and any others pertaining to the matter:
 # What would be the steps a user would need to take to build a custom Docker 
image, given their desire to customize the dependencies and the content (OS or 
otherwise) of said images?
 # How can we ensure the user does not need to rebuild the image if only the 
spark-submit version changes?

The end deliverable for this ticket is a design document, and then we'll create 
sub-issues for the technical implementation and documentation of the contract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to