[
https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824993#comment-16824993
]
Udbhav Agrawal commented on SPARK-25262:
----------------------------------------
[~rvesse] for the second part, can we move out {{LocalDirFetureStep from
baseFeatures and}} provide a new configuration something likeĀ
{color:#6a8759}spark.kubernetes.emptyDir.disable {color}and handle it inĀ
{{KubernetesDriverBuilder.scala}} so then user will have flexibility to define
different volume type to back the local directories other than {{emptyDir}}
through pod templates
> Make Spark local dir volumes configurable with Spark on Kubernetes
> ------------------------------------------------------------------
>
> Key: SPARK-25262
> URL: https://issues.apache.org/jira/browse/SPARK-25262
> Project: Spark
> Issue Type: Improvement
> Components: Kubernetes
> Affects Versions: 2.3.0, 2.3.1
> Reporter: Rob Vesse
> Priority: Major
>
> As discussed during review of the design document for SPARK-24434 while
> providing pod templates will provide more in-depth customisation for Spark on
> Kubernetes there are some things that cannot be modified because Spark code
> generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}}
> which is done by {{LocalDirsFeatureStep.scala}}. For each directory
> specified, or a single default if no explicit specification, it creates a
> Kubernetes {{emptyDir}} volume. As noted in the Kubernetes documentation
> this will be backed by the node storage
> (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir). In some
> compute environments this may be extremely undesirable. For example with
> diskless compute resources the node storage will likely be a non-performant
> remote mounted disk, often with limited capacity. For such environments it
> would likely be better to set {{medium: Memory}} on the volume per the K8S
> documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different
> volume type to back the local directories and there is no possibility to do
> that.
> Pod templates will not really solve either of these issues because Spark is
> always going to attempt to generate a new volume for each local directory and
> always going to set these as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}}
> volumes
> * Modify the logic to check if there is a volume already defined with the
> name and if so skip generating a volume definition for it
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]