[
https://issues.apache.org/jira/browse/SPARK-47475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-47475:
----------------------------------
Parent: SPARK-44111
Issue Type: Sub-task (was: Improvement)
> Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode
> -------------------------------------------------------------------------
>
> Key: SPARK-47475
> URL: https://issues.apache.org/jira/browse/SPARK-47475
> Project: Spark
> Issue Type: Sub-task
> Components: Deploy, Kubernetes, Spark Core
> Affects Versions: 4.0.0
> Reporter: Jiale Tan
> Assignee: Jiale Tan
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Under K8s cluster deployment mode, all the jars, including primary resource
> jar, jars from {{--jars}} or {{spark.jars}}, will be downloaded to driver
> local and then served to executors through file server running on driver.
> When jars are big and the application requests a lot of executors, the
> massive concurrent jars download from the driver will cause network
> saturation. In this case, the executors jar download will timeout, causing
> executors to be terminated. From user point of view, the application is
> trapped in the loop of massive executor loss and re-provision, but never gets
> enough live executors as requested, which leads to job SLA breach or
> sometimes job failure.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]