[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319009#comment-15319009
 ] 

Adam B commented on MESOS-5545:
-------------------------------

Thinking bigger picture, rack awareness is just a crude approximation of the 
metrics you really care about: 1) latency to the data/nodes your task cares 
about, and 2) fault domains. Latency isn't as trivial as a static topology and 
can vary from cluster to cluster (and can be even more complicated if the data 
is replicated), and fault domains may be hierarchical (or overlapping, in the 
case of network fault domains vs. power fault domains). Rather than adding 
"rack" awareness (plus AZ/region awareness) it may be better to focus on a 
qualitative QoS.

Example: my task may require <Xms latency between node A and node B. I could 
assume that being on the same "rack" would give me this guarantee, but what if 
the operator installed a second switch in the rack, separating A and B, and the 
connection between them fails. Then, even though the "rack_id" attribute on 
these agents stays the same, the latency does not. Conversely, what if A and B 
are on different racks, and their infrastructure is upgraded so that the 
latency between them drops below my Xms threshold. Now my scheduler is avoiding 
these racks even though they meet my performance criteria.

Unfortunately, I don't have any brilliant solutions for acquiring these latency 
and fault domain metrics, as much of that is up to the cloud/infrastructure 
provider.

> Add rack awareness support for Mesos resources
> ----------------------------------------------
>
>                 Key: MESOS-5545
>                 URL: https://issues.apache.org/jira/browse/MESOS-5545
>             Project: Mesos
>          Issue Type: Story
>          Components: hadoop, master
>            Reporter: Fan Du
>         Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to