[
https://issues.apache.org/jira/browse/SPARK-57467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Andrew updated SPARK-57467:
---------------------------------
Description:
Even when building identical resource profiles, executors are not reused –
e.g., running the following snippet twice will lead to new executors starting
up both times, which is unnecessary:
{code:java}
profile_builder = ResourceProfileBuilder()
executor_requests = ExecutorResourceRequests().cores(12)
task_requests = TaskResourceRequests().cpus(2)
profile =
profile_builder.require(executor_requests).require(task_requests).build
def _fn(dfs):
for df in dfs:
yield df
df = spark.range(10).select(F.col("id").cast("string"))
df.mapInPandas(_fn, df.schema, False, profile).show(n=10)
{code}
ResourceProfileManager has a method 'getEquivalentProfile', but it is only
called in DAGScheduler.mergeResourceProfilesForStage when
stageResourceProfiles.size > 1.
was:
Even when building identical resource profiles, executors are not reused –
e.g., running the following snippet twice will lead to new executors starting
up both times, which is unnecessary:
{code:java}
profile_builder = ResourceProfileBuilder()
executor_requests = ExecutorResourceRequests().cores(12)
task_requests = TaskResourceRequests().cpus(2)
profile =
profile_builder.require(executor_requests).require(task_requests).build
def _fn(dfs):
for df in dfs:
yield df
df = spark.range(10).select(F.col("id").cast("string"))
df.mapInPandas(_fn, df.schema, False, profile).show(n=10)
{code}
ResourceProfileManager has a method '
getEquivalentProfile', but it is only called in
DAGScheduler.mergeResourceProfilesForStage when
stageResourceProfiles.size > 1.
> Executors are not reused for identical resource profiles
> --------------------------------------------------------
>
> Key: SPARK-57467
> URL: https://issues.apache.org/jira/browse/SPARK-57467
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Affects Versions: 4.1.2
> Reporter: Peter Andrew
> Priority: Major
>
>
> Even when building identical resource profiles, executors are not reused –
> e.g., running the following snippet twice will lead to new executors starting
> up both times, which is unnecessary:
> {code:java}
> profile_builder = ResourceProfileBuilder()
> executor_requests = ExecutorResourceRequests().cores(12)
> task_requests = TaskResourceRequests().cpus(2)
> profile =
> profile_builder.require(executor_requests).require(task_requests).build
> def _fn(dfs):
> for df in dfs:
> yield df
> df = spark.range(10).select(F.col("id").cast("string"))
> df.mapInPandas(_fn, df.schema, False, profile).show(n=10)
> {code}
> ResourceProfileManager has a method 'getEquivalentProfile', but it is only
> called in DAGScheduler.mergeResourceProfilesForStage when
> stageResourceProfiles.size > 1.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]