[ 
https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161636#comment-14161636
 ] 

Sandy Ryza commented on SPARK-3797:
-----------------------------------

Not necessarily opposed to this, but wanted to bring up some of the drawbacks 
of running a Spark shuffle service inside YARN NodeManagers.  And the 
alternative.

* *Dependencies* .  We will need to avoid dependency conflicts between Spark's 
shuffle service and the rest of the NodeManager.  It's worth keeping in mind 
that the NodeManager may include Hadoop server-side dependencies we haven't 
dealt with in the past through depending on hadoop-client, and that its 
dependencies will also need to jive with other auxiliary services like MR's and 
Tez's. Unlike in Spark, where we place Spark jars in front of Hadoop jars and 
thus allow Spark versions to take precedence, NodeManagers presumably run with 
Hadoop jars in front.
* *Resource management* .  YARN will soon have support for some disk I/O 
isolation and scheduling (YARN-2139).  Running inside the NodeManager means 
that we won't be able to account for serving shuffle data inside of this. 
* *Deployment* . Where currently "installing" Spark on YARN at most means 
placing a Spark assembly jar on HDFS, this would require deploying Spark bits 
on every node in the cluster.
* *Rolling Upgrades* . Some proposed YARN work will allow containers to 
continue running while NodeManagers restart.  With Spark depending on the 
NodeManager to serve data, these upgrades would interfere with running Spark 
applications in situations where they otherwise might not.

The other option worth considering is to run the shuffle service in containers 
that sit beside the executor(s) on each node. This avoids all the problems 
above, but brings a couple of its own:
* Under many cluster configurations, YARN expects each container to take up at 
least a minimum amount of memory and CPU.  The shuffle service, which would use 
little of either of these, would sit on these resources unnecessarily.
* Scheduling becomes more difficult.  Spark would require two different 
containers to be scheduled on any node it wants to be functional on.  Once YARN 
has container resizing (YARN-1197), this could be mitigated by running two 
processes inside a single container.  If Spark wanted to kill an executor, it 
could order the executor process to kill itself and then shrink the container 
to the size of the shuffle service.

> Run the shuffle service inside the YARN NodeManager as an AuxiliaryService
> --------------------------------------------------------------------------
>
>                 Key: SPARK-3797
>                 URL: https://issues.apache.org/jira/browse/SPARK-3797
>             Project: Spark
>          Issue Type: Sub-task
>          Components: YARN
>            Reporter: Patrick Wendell
>            Assignee: Andrew Or
>
> It's also worth considering running the shuffle service in a YARN container 
> beside the executor(s) on each node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to