Rob Vesse created SPARK-25262:
---------------------------------

             Summary: Make Spark local dir volumes configurable with Spark on 
Kubernetes
                 Key: SPARK-25262
                 URL: https://issues.apache.org/jira/browse/SPARK-25262
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 2.3.1, 2.3.0
            Reporter: Rob Vesse


As discussed during review of the design document for SPARK-24434 while 
providing pod templates will provide more in-depth customisation for Spark on 
Kubernetes there are some things that cannot be modified because Spark code 
generates pod specs in very specific ways.

The particular issue identified relates to handling on {{spark.local.dirs}} 
which is done by {{LocalDirsFeatureStep.scala}}.  For each directory specified, 
or a single default if no explicit specification, it creates a Kubernetes 
{{emptyDir}} volume.  As noted in the Kubernetes documentation this will be 
backed by the node storage 
(https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
compute environments this may be extremely undesirable.  For example with 
diskless compute resources the node storage will likely be a non-performant 
remote mounted disk, often with limited capacity.  For such environments it 
would likely be better to set {{medium: Memory}} on the volume per the K8S 
documentation to use a {{tmpfs}} volume instead.

Another closely related issue is that users might want to use a different 
volume type to back the local directories and there is no possibility to do 
that.

Pod templates will not really solve either of these issues because Spark is 
always going to attempt to generate a new volume for each local directory and 
always going to set these as {{emptyDir}}.

Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:

* Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
volumes
* Modify the logic to check if there is a volume already defined with the name 
and if so skip generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to