[
https://issues.apache.org/jira/browse/SLIDER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346037#comment-14346037
]
Steve Loughran commented on SLIDER-799:
---------------------------------------
Implementation strategy
# {{OutstandingRequest}} instances add timestamp and "requestRelaxed" flag
# timestamp set from {{RoleHistory.now()}} to make it possible for tests to
override.
# {{OutstandingRequestTracker}} continues to track requests; continues to
remove entries when a request is satisfied on a nominated node.
# Only now it can also enum all requests that are > a specific timeout
# Caller than then "relax" them: cancel the existing request, re-issue with a
relaxed flag (i.e. the alternate YARN priority).
# The outstanding request will remain in the queue, only now marked to show how
placement has been relaxed.
This will need some other changes
* Some heartbeat event to trigger a relaxation scan, cancel outstanding
requests and re-issue new ones. This is a bit like review-and-request, except
now it's cancel-then-re-request.
* need enough state preserved in {{OutstandingRequest}} to enable new request
to be rebuilt. (e.g YARN requirements)
* could create a new risk of a race condition, assignment event comes in
while/before the new request has been issued.
> AM to decide when to relax placement policy from specific host to rack/cluster
> ------------------------------------------------------------------------------
>
> Key: SLIDER-799
> URL: https://issues.apache.org/jira/browse/SLIDER-799
> Project: Slider
> Issue Type: Improvement
> Components: appmaster
> Affects Versions: Slider 0.70
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Critical
> Fix For: Slider 0.80
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> If Slider asks for relaxed affinity, YARN only gives it ~1 second for free
> capacity to appear on a node before it falls back to non-local assignment.
> While this is OK for analytics throughput, it's suboptimal for placement of
> code such as HBase region servers.
> AM needs to take charge of the placement and decide for itself when to
> convert from placed to relaxed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)