[
https://issues.apache.org/jira/browse/SPARK-27499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860519#comment-16860519
]
Junjie Chen edited comment on SPARK-27499 at 6/11/19 5:20 AM:
--------------------------------------------------------------
Hi, [~dongjoon], I know SPARK_LOCAL_DIRS can be mounted as emptyDir. However,
emptyDir just one directory on node. I opened this Jira to track a feature to
setting multiple directories to full utilize the nodes' disks bandwidth for
spilling, which I think currently it can not be achieve through setting
spark.local.dir. Even I set to multiple dirs, they still map to one directory
on node.
This Jira was intended to use hostPath volumes mounts as spark.local.dir, which
needs build mountVolumeFeature to built before localDirFeature, while currently
the localDirFeature is built before mountVolumeFeature.
was (Author: junjie):
Hi, [~dongjoon], I know SPARK_LOCAL_DIRS can be mounted as emptyDir. However,
emptyDir just one directory on node. I opened this Jira to track a feature to
setting multiple directories to full utilize the nodes' disks bandwidth for
spilling, which I think currently it can not be achieve through setting
spark.local.dir. Even I set to multiple dirs, they still map to one directory
on node.
This Jira is intended to use hostPath volumes mounts as spark.local.dir, for
exmaple:
spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.path=/data/mnt-x
> Support mapping spark.local.dir to hostPath volume
> --------------------------------------------------
>
> Key: SPARK-27499
> URL: https://issues.apache.org/jira/browse/SPARK-27499
> Project: Spark
> Issue Type: Improvement
> Components: Kubernetes
> Affects Versions: 3.0.0
> Reporter: Junjie Chen
> Priority: Minor
> Fix For: 2.4.0
>
>
> Currently, the k8s executor builder mount spark.local.dir as emptyDir or
> memory, it should satisfy some small workload, while in some heavily workload
> like TPCDS, both of them can have some problem, such as pods are evicted due
> to disk pressure when using emptyDir, and OOM when using tmpfs.
> In particular on cloud environment, users may allocate cluster with minimum
> configuration and add cloud storage when running workload. In this case, we
> can specify multiple elastic storage as spark.local.dir to accelerate the
> spilling.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]