[
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318745#comment-15318745
]
Joris Van Remoortere edited comment on MESOS-5545 at 6/7/16 3:58 PM:
---------------------------------------------------------------------
Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual
conversation. It's great that you've captured and documented some ideas.
I would suggest that the next steps involve:
1. Raising this at the community sync to:
- Get a sense of timeline.
- Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component
change like this.
4. Then we can get to the patches.
The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough
interest to follow through as of yet. I would focus the most on getting this
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some
implementation / configuration bias (LLDP). I would work on partitioning
general fault domain awareness (Mesos) from assigning of the attributes
(Operator / automation).
- Take a step back and consider what other information we may want to associate
with fault domains in the future. Is there a structure that is more resilient
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take
based upon it. Have we thought out all the actions, and whether they would
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the
life-time of an agent. For example, currently we don't allow resources or IPs
to change. If this were also true for fault domain attributes, it would
simplify the implementation. If you feel that dynamic attributes are necessary,
then I would urge you to make that a phase 2 project and first work with the
community to agree on a common pattern for updating any attributes on the
agent, and how to surface consequential changes to both tasks and frameworks.
(You may see why I suggest static to begin with ;-) )
was (Author: jvanremoortere):
Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual
conversation. It's great that you've captured and documented some ideas.
I would suggest that the next steps involve:
1. Raising this at the community sync to:
- Get a sense of timeline.
- Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component
change like this.
4. Then we can get to the patches.
The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough
interest to follow through as of yet. I would focus the most on getting this
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some
implementation / configuration bias (LLDP). I would work on partitioning
general fault domain awareness (Mesos) from assigning of the attributes
(Operator / automation).
- Take a step back and consider what other information we may want to associate
with fault domains in the future. Is there a structure that is more resilient
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take
based upon it. Have we thought out all the actions, and whether they would
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the
life-time of an agent. For example, currently we don't allow resources or IPs
to change. If this were also true for fault domain attributes, it would
simplify the implementation. If you feel that dynamic attributes are necessary,
then I would urge you to make that a phase 2 project and first work with the
community to agree on a common pattern for updating any attributes on the
agent, and how to surface consequential changes to both tasks and frameworks.
(You may see why I suggest static to begin with ;-) )
> Add rack awareness support for Mesos resources
> ----------------------------------------------
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
> Issue Type: Story
> Components: hadoop, master
> Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the
> cluster, for example, rack topology. While lots of data center applications
> have rack awareness feature to provide data locality, fault tolerance and
> intelligent task placement. This ticket tries to investigate how to add rack
> awareness for Mesos resources topology.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)