[
https://issues.apache.org/jira/browse/AURORA-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075524#comment-15075524
]
Bill Farner commented on AURORA-1109:
-------------------------------------
I don't believe there is anyone working on it currently, but i'm happy to
shepherd if you would like to contribute!
A patch was attempted a while back: https://reviews.apache.org/r/30710/, but
was too large and diverged from our development practices enough that it was
difficult to accept. However, the discussion in that review will be valuable
context. Additionally, i think some changes in the codebase since that patch
was posted which will make the feature easier to add now. My best advice is to
sketch out a proof of concept, and if the patch looks large - try to identify
ways to split it into a few bite-sized commits.
The high-level areas of attention remain the same:
- given an {{Offer}}, we need a policy for deciding which role resources to
prefer
- we need a policy for prioritizing {{Offer}} s based based on the resources
within them (do we prefer {{*}} or {{$MY_ROLE}}?)
- when accepting an {{Offer}} we need to reconstruct {{Resources}} selectively
from the appropriate roles
> Add mesos role feature
> -----------------------
>
> Key: AURORA-1109
> URL: https://issues.apache.org/jira/browse/AURORA-1109
> Project: Aurora
> Issue Type: Story
> Components: Scheduler
> Reporter: zhanglong
> Assignee: zhanglong
>
> Problems
> We are from eBay platform team. Previously, we used marathon to generate
> Jenkins master instance in dedicated vms and recieve resource offer from same
> dedicated vms. For the details, please refer to
> http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/#.VNQUuC6_SPU
> Now, we found Aurora is more stable and powerful. We are moving from Marathon
> to Aurora. During the move, we found there is no mesos role in Aurora now.
> But we need use mesos role way to solve the problem in section "Frameworks
> stopped receiving offers after a while" of the given url.
> Here is a snippet of the problem description:
> We noticed occurred after we used Marathon to create the initial set of CI
> masters. As those CI masters started registering themselves as frameworks,
> Marathon stopped receiving any offers from Mesos; essentially, no new CI
> masters could be launched. Let’s start with Marathon. In the DRF model, it
> was unfair to treat Marathon in the same bucket/role alongside hundreds of
> connected Jenkins frameworks. After launching all these Jenkins frameworks,
> Marathon had a large resource share and Mesos would aggressively offer
> resources to frameworks that were using little or no resources. Marathon was
> placed last in priority and got starved out.
> We decided to define a dedicated Mesos role for Marathon and to have all of
> the Mesos slaves that were reserved for Jenkins master instances support that
> Mesos role. Jenkins frameworks were left with the default role “”.* This
> solved the problem – Mesos offered resources per role and hence Marathon
> never got starved out. A framework with a special role will get resource
> offers from both slaves supporting that special role and also from the
> default role “”.** However, since we were using placement constraints,
> Marathon accepted resource offers only from slaves that supported both the
> role and the placement constraints.*
> Solution
> So we add role feature is the source code to solve the problem in same way:
> When accept a resource offer, Aurora will send back the needed resources to
> Mesos with the mesos role in resource offer.
> How to configure the Mesos role:
> 1.Add cmd option --mesos_role=${Mesos role name} when start Aurora scheduler.
> We change the test cases according code change. Each changed test case is
> green
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)