[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources

Joris Van Remoortere (JIRA) Tue, 07 Jun 2016 09:03:52 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318745#comment-15318745
 ]


Joris Van Remoortere edited comment on MESOS-5545 at 6/7/16 3:58 PM:
---------------------------------------------------------------------

Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual 
conversation. It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
- Get a sense of timeline.
- Find a shepherd.

2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component 
change like this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough 
interest to follow through as of yet. I would focus the most on getting this 
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some 
implementation / configuration bias (LLDP). I would work on partitioning 
general fault domain awareness (Mesos) from assigning of the attributes 
(Operator / automation).
- Take a step back and consider what other information we may want to associate 
with fault domains in the future. Is there a structure that is more resilient 
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take 
based upon it. Have we thought out all the actions, and whether they would 
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the 
life-time of an agent. For example, currently we don't allow resources or IPs 
to change. If this were also true for fault domain attributes, it would 
simplify the implementation. If you feel that dynamic attributes are necessary, 
then I would urge you to make that a phase 2 project and first work with the 
community to agree on a common pattern for updating any attributes on the 
agent, and how to surface consequential changes to both tasks and frameworks. 
(You may see why I suggest static to begin with ;-) )




was (Author: jvanremoortere):
Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual 
conversation. It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
- Get a sense of timeline.
- Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component 
change like this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough 
interest to follow through as of yet. I would focus the most on getting this 
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some 
implementation / configuration bias (LLDP). I would work on partitioning 
general fault domain awareness (Mesos) from assigning of the attributes 
(Operator / automation).
- Take a step back and consider what other information we may want to associate 
with fault domains in the future. Is there a structure that is more resilient 
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take 
based upon it. Have we thought out all the actions, and whether they would 
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the 
life-time of an agent. For example, currently we don't allow resources or IPs 
to change. If this were also true for fault domain attributes, it would 
simplify the implementation. If you feel that dynamic attributes are necessary, 
then I would urge you to make that a phase 2 project and first work with the 
community to agree on a common pattern for updating any attributes on the 
agent, and how to surface consequential changes to both tasks and frameworks. 
(You may see why I suggest static to begin with ;-) )



> Add rack awareness support for Mesos resources
> ----------------------------------------------
>
>                 Key: MESOS-5545
>                 URL: https://issues.apache.org/jira/browse/MESOS-5545
>             Project: Mesos
>          Issue Type: Story
>          Components: hadoop, master
>            Reporter: Fan Du
>         Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources

Reply via email to