[
https://issues.apache.org/jira/browse/SAMZA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220376#comment-15220376
]
Jake Maes commented on SAMZA-886:
---------------------------------
One follow up comment for posterity.
With this issue host affinity seems to work in a degraded mode and performs
better with jobs that have more containers. The reason is when these rack
resolver issues occur, none of the preferred host requests are honored and a
set of random containers are returned. However the HostAwareContainerAllocator
doesn't know this. If any of the random containers happens to match a preferred
host, it uses it accordingly. And the more containers requested (for a fixed
size cluster) the higher the probability that random containers will match a
preferred host.
> Investigate 'relax locality' to improve Host Affinity
> ------------------------------------------------------
>
> Key: SAMZA-886
> URL: https://issues.apache.org/jira/browse/SAMZA-886
> Project: Samza
> Issue Type: Bug
> Reporter: Jagadish
> Assignee: Jake Maes
> Attachments: RelaxedLocality experiments.pdf
>
>
> I ran several tests experimenting Samza with a cluster of size 36 nodes. I
> have the following observations:
> 1.On a cluster with about 50% utilization. The percentage of requests that
> are mapped to preferred hosts seems to depend on yarn.container.count. The %
> is higher when yarn.container.count is comparable to the size of the cluster.
> (For example.) I get about 50% of requests matched when yarn.container.count
> is 30. and When yarn.container.count is 10, only 27% of requests are matched.
> (on a 36 node cluster)
> One reason is because, when spawning a large # of containers initially, many
> requests are made in bulk successively, there is a good chance that any
> random host in the cluster will match with the preferred request. However,
> when spawning a particular container during failure, there's only one request
> for the failed container, and it has a lesser chance of a match.
> The results are averaged across 20 runs in each scenario.
> 2. On a cluster with about zero utilization, 100% of requests are matched to
> preferred hosts irrespective of yarn.container.count.
> This ticket is to explore alternatives to see if they will improve % of
> matched hosts.
> I believe these ideas are worth trying:
> 1. Yarn supports the idea of a 'relaxed locality' flag that can be specified
> with the request. We could set 'relaxed locality' to false. (This will ensure
> that we get the request on the exact same host we ask for.) If we don't get
> such a request within a timeout, we may re-request the same request with
> 'relaxed locality' to true. (as we currently do now.)
> 2. Re-issue the same preferred host request again, if the hosts returned
> don't match the request.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)