[
https://issues.apache.org/jira/browse/SPARK-55077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jam xu updated SPARK-55077:
---------------------------
Description:
What changes were proposed in this pull request?
Add support for `spark.kubernetes.archives.avoidDownloadSchemes` configuration
to avoid downloading archives to the driver in Kubernetes cluster mode, similar
to the existing `spark.kubernetes.jars.avoidDownloadSchemes` (SPARK-47475).
Why are the changes needed?
When archives are large and executor count is high, downloading all archives to
the driver can cause network saturation and timeouts. This feature allows
executors to fetch archives directly from remote storage.
Does this PR introduce any user-facing change?
Yes. Add a new configuration:`spark.kubernetes.archives.avoidDownloadSchemes`
was:
What changes were proposed in this pull request?
Add support for `spark.kubernetes.archives.avoidDownloadSchemes`
configuration to avoid downloading archives to the driver in Kubernetes
cluster mode, similar to the existing
`spark.kubernetes.jars.avoidDownloadSchemes` (SPARK-47475).
Why are the changes needed?
When archives are large and executor count is high, downloading all
archives to the driver can cause network saturation and timeouts. This
feature allows executors to fetch archives directly from remote storage.
Does this PR introduce any user-facing change?
Yes. Add a new configuration:
`spark.kubernetes.archives.avoidDownloadSchemes`
> [CORE][K8S] Support spark.kubernetes.archives.avoidDownloadSchemes for K8s
> Cluster Mode
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-55077
> URL: https://issues.apache.org/jira/browse/SPARK-55077
> Project: Spark
> Issue Type: Improvement
> Components: Kubernetes, Spark Core
> Affects Versions: 4.0.0
> Reporter: jam xu
> Priority: Major
>
> What changes were proposed in this pull request?
> Add support for `spark.kubernetes.archives.avoidDownloadSchemes`
> configuration to avoid downloading archives to the driver in Kubernetes
> cluster mode, similar to the existing
> `spark.kubernetes.jars.avoidDownloadSchemes` (SPARK-47475).
> Why are the changes needed?
> When archives are large and executor count is high, downloading all archives
> to the driver can cause network saturation and timeouts. This feature allows
> executors to fetch archives directly from remote storage.
> Does this PR introduce any user-facing change?
> Yes. Add a new configuration:`spark.kubernetes.archives.avoidDownloadSchemes`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]