[ 
https://issues.apache.org/jira/browse/SPARK-57467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Andrew updated SPARK-57467:
---------------------------------
    Description: 
 

Even when building identical resource profiles, executors are not reused – 
e.g., running the following snippet twice will lead to new executors starting 
up both times, which is unnecessary:
{code:java}
profile_builder = ResourceProfileBuilder()
executor_requests = ExecutorResourceRequests().cores(12)
task_requests = TaskResourceRequests().cpus(2)
profile = 
profile_builder.require(executor_requests).require(task_requests).build 

def _fn(dfs):
  for df in dfs:
    yield df

df = spark.range(10).select(F.col("id").cast("string"))
df.mapInPandas(_fn, df.schema, False, profile).show(n=10)
{code}
ResourceProfileManager has a method 'getEquivalentProfile', but it is only 
called in DAGScheduler.mergeResourceProfilesForStage when 
stageResourceProfiles.size > 1.

  was:
 

Even when building identical resource profiles, executors are not reused – 
e.g., running the following snippet twice will lead to new executors starting 
up both times, which is unnecessary:
{code:java}
profile_builder = ResourceProfileBuilder()
executor_requests = ExecutorResourceRequests().cores(12)
task_requests = TaskResourceRequests().cpus(2)
profile = 
profile_builder.require(executor_requests).require(task_requests).build 

def _fn(dfs):
  for df in dfs:
    yield df

df = spark.range(10).select(F.col("id").cast("string"))
df.mapInPandas(_fn, df.schema, False, profile).show(n=10)
{code}
ResourceProfileManager has a method '
getEquivalentProfile', but it is only called in 
DAGScheduler.mergeResourceProfilesForStage when 
stageResourceProfiles.size > 1.


> Executors are not reused for identical resource profiles
> --------------------------------------------------------
>
>                 Key: SPARK-57467
>                 URL: https://issues.apache.org/jira/browse/SPARK-57467
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 4.1.2
>            Reporter: Peter Andrew
>            Priority: Major
>
>  
> Even when building identical resource profiles, executors are not reused – 
> e.g., running the following snippet twice will lead to new executors starting 
> up both times, which is unnecessary:
> {code:java}
> profile_builder = ResourceProfileBuilder()
> executor_requests = ExecutorResourceRequests().cores(12)
> task_requests = TaskResourceRequests().cpus(2)
> profile = 
> profile_builder.require(executor_requests).require(task_requests).build 
> def _fn(dfs):
>   for df in dfs:
>     yield df
> df = spark.range(10).select(F.col("id").cast("string"))
> df.mapInPandas(_fn, df.schema, False, profile).show(n=10)
> {code}
> ResourceProfileManager has a method 'getEquivalentProfile', but it is only 
> called in DAGScheduler.mergeResourceProfilesForStage when 
> stageResourceProfiles.size > 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to