[
https://issues.apache.org/jira/browse/SPARK-31437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084086#comment-17084086
]
Thomas Graves commented on SPARK-31437:
---------------------------------------
so there are multiple reasons they are tied together for first implementation
1) it is the way it works now (user specifies an executor requirement and task
requirement together) and its much smaller code change wise and less
complexity. Get something in and working and see how its used and improve as
needed. I've had a hard enough time getting this feature reviewed as is,
making it more complex would have made that much harder.
2) You have to have a way to say what your executor requirements are. I thought
about user just being able to specify the task requirements, but you still need
a way to specify the executor requirements as well. Either in terms of the
task requirements (ie I want 3 tasks to fit on one executor) and there are
other things that don't fit into task requirements that you might need to
specify separately in executor requirements. things like overhead memory and
other confs (to be added later).
3) resource waste as already discussed. One of the main use cases we targeted
here is the etl to ml use case. If you start putting etl tasks on nodes with
GPU that don't use the GPU, that gets expensive as you are wasting the GPU. I
understand your use case is different but that wasn't the main target for the
first implementation. This is an RDD api and I would have expected much of the
ETL to be more dataset/dataframe based. If you specify them separately I see
that as potentially a huge waste of resources.
4) I think this is much easier for the user to reason about in most cases. They
know exactly what they get and don't have to worry about making sure they have
requested executors that meet the task requirements in the past, or figure out
how much resources they are wasting because they didn't configure it properly.
_" At the same time we can still have the opportunity to keep the overall logic
simple: we can choose one strategy from several to create ResourceProfile from
incoming ResourceRequest. "_
I don't understand what you mean by this?
I think overall you would have to be more specific on a proposal for decoupling
them and then how coupling would work. Let says I have my use case where I have
etl -> ML. My etl tasks uses 8 cores, my ml tasks use 8 cores and 4 cpus. How
do I keep my etl tasks from running on the ML executors without wasting
resources?
> Try assigning tasks to existing executors by which required resources in
> ResourceProfile are satisfied
> ------------------------------------------------------------------------------------------------------
>
> Key: SPARK-31437
> URL: https://issues.apache.org/jira/browse/SPARK-31437
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler
> Affects Versions: 3.1.0
> Reporter: Hongze Zhang
> Priority: Major
>
> By the change in [PR|https://github.com/apache/spark/pull/27773] of
> SPARK-29154, submitted tasks are scheduled onto executors only if resource
> profile IDs strictly match. As a result Spark always starts new executors for
> customized ResourceProfiles.
> This limitation makes working with process-local jobs unfriendly. E.g. Task
> cores has been increased from 1 to 4 in a new stage, and executor has 8
> slots, it is expected that 2 new tasks can be run on the existing executor
> but Spark starts new executors for new ResourceProfile. The behavior is
> unnecessary.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]