[ 
https://issues.apache.org/jira/browse/SPARK-55077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jam xu updated SPARK-55077:
---------------------------
    Description: 
What changes were proposed in this pull request?

Add support for `spark.kubernetes.archives.avoidDownloadSchemes` configuration 
to avoid downloading archives to the driver in Kubernetes cluster mode, similar 
to the existing `spark.kubernetes.jars.avoidDownloadSchemes` (SPARK-47475).

Why are the changes needed?

When archives are large and executor count is high, downloading all archives to 
the driver can cause network saturation and timeouts. This feature allows 
executors to fetch archives directly from remote storage.

Does this PR introduce any user-facing change?

Yes. Add a new configuration:`spark.kubernetes.archives.avoidDownloadSchemes`

  was:
What changes were proposed in this pull request?

  Add support for `spark.kubernetes.archives.avoidDownloadSchemes`
  configuration to avoid downloading archives to the driver in Kubernetes
  cluster mode, similar to the existing
  `spark.kubernetes.jars.avoidDownloadSchemes` (SPARK-47475).

Why are the changes needed?

  When archives are large and executor count is high, downloading all
  archives to the driver can cause network saturation and timeouts. This
  feature allows executors to fetch archives directly from remote storage.

 

 Does this PR introduce any user-facing change?

  Yes. Add a new configuration:
  `spark.kubernetes.archives.avoidDownloadSchemes`


> [CORE][K8S] Support spark.kubernetes.archives.avoidDownloadSchemes for K8s   
> Cluster Mode
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-55077
>                 URL: https://issues.apache.org/jira/browse/SPARK-55077
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 4.0.0
>            Reporter: jam xu
>            Priority: Major
>
> What changes were proposed in this pull request?
> Add support for `spark.kubernetes.archives.avoidDownloadSchemes` 
> configuration to avoid downloading archives to the driver in Kubernetes 
> cluster mode, similar to the existing 
> `spark.kubernetes.jars.avoidDownloadSchemes` (SPARK-47475).
> Why are the changes needed?
> When archives are large and executor count is high, downloading all archives 
> to the driver can cause network saturation and timeouts. This feature allows 
> executors to fetch archives directly from remote storage.
> Does this PR introduce any user-facing change?
> Yes. Add a new configuration:`spark.kubernetes.archives.avoidDownloadSchemes`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to