[ https://issues.apache.org/jira/browse/AURORA-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882006#comment-13882006 ]
brian wickman commented on AURORA-117: -------------------------------------- One option is to use a simple limit index in order to quickly determine if scheduling an instance of a task on a slave will cause it to violate any limit constraints. ~ python impl below: {noformat} class LimitIndex(defaultdict): """An index to keep track of limit constraints per job.""" def __init__(self, job): self.__job = job super(LimitIndex, self).__init__(lambda: defaultdict(int)) def update_job(self, job): self.__job = job def add_slave(self, slave): for name, value in slave.attributes.items(): self[name][value] += 1 def remove_slave(self, slave): for name, value in slave.attributes.items(): self[name][value] -= 1 def is_valid(self, slave): """Would adding this slave go over our attribute limit?""" for name, limit in self.__job.constraints.limit_tuples(): if self[name][slave.attributes[name]] > limit: return False return True {noformat} > Scheduler performance issues with very large jobs > ------------------------------------------------- > > Key: AURORA-117 > URL: https://issues.apache.org/jira/browse/AURORA-117 > Project: Aurora > Issue Type: Task > Components: Scheduler > Reporter: Bill Farner > > The scheduler tends to have performance issues when scheduling very large > jobs. We've observed this with jobs exceeding 2000 instances. The > {{TaskScheduler}} thread tends to consume a large amount of CPU (100%, > limited by the global storage lock). Current hypothesis is that the majority > of the time is spent satisfying diversity constraints (rack, machine), which > require expensive queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)