[ 
https://issues.apache.org/jira/browse/SPARK-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567345#comment-14567345
 ] 

Steve Loughran commented on SPARK-4352:
---------------------------------------

As usual, when YARN-1042 is done, life gets easier: the AM asks YARN for the 
anti-affine placement.

If you look at how other YARN clients have implemented anti-affinity 
(TWILL-82), the blacklist is used to block off all nodes in use, with a 
request-at-a-time ramp-up to avoid >1 outstanding request being granted on the 
same node. 

As well as anti-affinity, life would be even better with dynamic container 
resize: if a single executor could expand/relax CPU capacity on demand, you'd 
only need one per node and then handle multiple tasks by running more work 
there. (This does nothing for RAM consumption though)

now, for some other fun, 

# you may want to consider which surplus containers to release, both 
outstanding requests and actually granted. In particular, if you want to cancel 
>1 outstanding request, which to choose? Any of them? The newest? The oldest? 
The node with the worst reliability statistics? Killing the newest works if you 
assume that the older containers have generated more host-local data that you 
wish to reuse.

# history may also be a factor in placement. If you are starting a session 
which continues/extends previous work, the previous location of the executors 
may be the first locality clue. Ask for containers on those nodes and there's a 
high likelihood that all the output data from the previous session will be 
stored locally on one of the nodes a container is assigned.

# Testing. There aren't any, are there? It's possible to simulate some of the 
basic operations, you just need to isolate the code which examines the 
application state and generates container request/release events from the 
actual interaction with the RM. 

I've done this before with the request to allocate/cancel [generating a list of 
operations to be submitted or 
simulated|https://github.com/apache/incubator-slider/blob/develop/slider-core/src/main/java/org/apache/slider/server/appmaster/state/AppState.java#L1908].
 When combined with a [mock YARN 
engine|https://github.com/apache/incubator-slider/tree/develop/slider-core/src/test/groovy/org/apache/slider/server/appmaster/model/mock],
 let us do things like [test historical placement 
logic|https://github.com/apache/incubator-slider/tree/develop/slider-core/src/test/groovy/org/apache/slider/server/appmaster/model/history]
 as well as whether to re-request containers on nodes where containers have 
just recently failed. While that mock stuff isn't that realistic, it can be 
used to test basic placement and failure handling logic.

More succinctly: you can write tests for this stuff by splitting request 
generation from the API calls & testing the request/release logic standalone

> Incorporate locality preferences in dynamic allocation requests
> ---------------------------------------------------------------
>
>                 Key: SPARK-4352
>                 URL: https://issues.apache.org/jira/browse/SPARK-4352
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 1.2.0
>            Reporter: Sandy Ryza
>            Assignee: Saisai Shao
>            Priority: Critical
>         Attachments: Supportpreferrednodelocationindynamicallocation.pdf
>
>
> Currently, achieving data locality in Spark is difficult unless an 
> application takes resources on every node in the cluster.  
> preferredNodeLocalityData provides a sort of hacky workaround that has been 
> broken since 1.0.
> With dynamic executor allocation, Spark requests executors in response to 
> demand from the application.  When this occurs, it would be useful to look at 
> the pending tasks and communicate their location preferences to the cluster 
> resource manager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to