[ 
https://issues.apache.org/jira/browse/SLIDER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346037#comment-14346037
 ] 

Steve Loughran commented on SLIDER-799:
---------------------------------------

Implementation strategy
# {{OutstandingRequest}} instances add timestamp and "requestRelaxed" flag
# timestamp set from {{RoleHistory.now()}} to make it possible for tests to 
override.
# {{OutstandingRequestTracker}} continues to track requests; continues to 
remove entries when a request is satisfied on a nominated node.
# Only now it can also enum all requests that are > a specific timeout
# Caller than then "relax" them: cancel the existing request, re-issue with a 
relaxed flag (i.e. the alternate YARN priority).
# The outstanding request will remain in the queue, only now marked to show how 
placement has been relaxed.

This will need some other changes
* Some heartbeat event to trigger a relaxation scan, cancel outstanding 
requests and re-issue new ones. This is a bit like review-and-request, except 
now it's cancel-then-re-request.
* need enough state preserved in  {{OutstandingRequest}} to enable new request 
to be rebuilt. (e.g YARN requirements)
* could create a new risk of a race condition, assignment event comes in 
while/before the new request has been issued. 

> AM to decide when to relax placement policy from specific host to rack/cluster
> ------------------------------------------------------------------------------
>
>                 Key: SLIDER-799
>                 URL: https://issues.apache.org/jira/browse/SLIDER-799
>             Project: Slider
>          Issue Type: Improvement
>          Components: appmaster
>    Affects Versions: Slider 0.70
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>             Fix For: Slider 0.80
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If Slider asks for relaxed affinity, YARN only gives it ~1 second for free 
> capacity to appear on a node before it falls back to non-local assignment. 
> While this is OK for analytics throughput, it's suboptimal for placement of 
> code such as HBase region servers.
> AM needs to take charge of the placement and decide for itself when to 
> convert from placed to relaxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to