[ 
https://issues.apache.org/jira/browse/SPARK-31437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084086#comment-17084086
 ] 

Thomas Graves commented on SPARK-31437:
---------------------------------------

so there are multiple reasons they are tied together for first implementation

1) it is the way it works now (user specifies an executor requirement and task 
requirement together) and its much smaller code change wise and less 
complexity. Get something in and working and see how its used and improve as 
needed.  I've had a hard enough time getting this feature reviewed as is, 
making it more complex would have made that much harder.

2) You have to have a way to say what your executor requirements are. I thought 
about user just being able to specify the task requirements, but you still need 
a way to specify the executor requirements as well.  Either in terms of the 
task requirements (ie I want 3 tasks to fit on one executor) and there are 
other things that don't fit into task requirements that you might need to 
specify separately in executor requirements. things like overhead memory and 
other confs (to be added later).

3) resource waste as already discussed. One of the main use cases we targeted 
here is the etl to ml use case.  If you start putting etl tasks on nodes with 
GPU that don't use the GPU, that gets expensive as you are wasting the GPU.  I 
understand your use case is different but that wasn't the main target for the 
first implementation.  This is an RDD api and I would have expected much of the 
ETL to be more dataset/dataframe based. If you specify them separately I see 
that as potentially a huge waste of resources.

4) I think this is much easier for the user to reason about in most cases. They 
know exactly what they get and don't have to worry about making sure they have 
requested executors that meet the task requirements in the past, or figure out 
how much resources they are wasting because they didn't configure it properly.

 

_" At the same time we can still have the opportunity to keep the overall logic 
simple: we can choose one strategy from several to create ResourceProfile from 
incoming ResourceRequest. "_

I don't understand what you mean by this?  

I think overall you would have to be more specific on a proposal for decoupling 
them and then how coupling would work. Let says I have my use case where I have 
etl -> ML.  My etl tasks uses 8 cores, my ml tasks use 8 cores and 4 cpus.  How 
do I keep my etl tasks from running on the ML executors without wasting 
resources?

 

 

> Try assigning tasks to existing executors by which required resources in 
> ResourceProfile are satisfied
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-31437
>                 URL: https://issues.apache.org/jira/browse/SPARK-31437
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 3.1.0
>            Reporter: Hongze Zhang
>            Priority: Major
>
> By the change in [PR|https://github.com/apache/spark/pull/27773] of 
> SPARK-29154, submitted tasks are scheduled onto executors only if resource 
> profile IDs strictly match. As a result Spark always starts new executors for 
> customized ResourceProfiles.
> This limitation makes working with process-local jobs unfriendly. E.g. Task 
> cores has been increased from 1 to 4 in a new stage, and executor has 8 
> slots, it is expected that 2 new tasks can be run on the existing executor 
> but Spark starts new executors for new ResourceProfile. The behavior is 
> unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to