Jie Yu created MESOS-2919:
-----------------------------

             Summary: Framework can overcommit oversubscribable resources 
during master failover.
                 Key: MESOS-2919
                 URL: https://issues.apache.org/jira/browse/MESOS-2919
             Project: Mesos
          Issue Type: Bug
            Reporter: Jie Yu
            Priority: Critical


This is due to a bug in the hierarchical allocator. Here is the sequence of 
events:

1) slave uses a fixed resource estimator which advertise 4 revocable cpus
2) a framework A launches a task that uses all the 4 revocable cpus
3) master fails over
4) slave re-registers with the new master, and sends UpdateSlaveMessage with 4 
revocable cpus as oversubscribed resources
5) framework A hasn't registered yet, therefore, the slave's available 
resources will be 4 revocable cpus
6) framework A registered and will receive an additional 4 revocable cpus. So 
it can launch another task with 4 revocable cpus (that means 8 total!)

The problem is due to the way we calculate 'allocated' resource in allocator 
when 'updateSlave'. If the framework is not registered, the 'allocation' below 
is not accurate (check that if block in 'addSlave').

{code}
template <class RoleSorter, class FrameworkSorter>
void
HierarchicalAllocatorProcess<RoleSorter, FrameworkSorter>::updateSlave(
    const SlaveID& slaveId,
    const Resources& oversubscribed)
{
  CHECK(initialized);
  CHECK(slaves.contains(slaveId));

  // Check that all the oversubscribed resources are revocable.
  CHECK_EQ(oversubscribed, oversubscribed.revocable());

  // Update the total resources.

  // First remove the old oversubscribed resources from the total.
  slaves[slaveId].total -= slaves[slaveId].total.revocable();

  // Now add the new estimate of oversubscribed resources.
  slaves[slaveId].total += oversubscribed;

  // Now, update the total resources in the role sorter.
  roleSorter->update(
      slaveId,
      slaves[slaveId].total.unreserved());

  // Calculate the current allocation of oversubscribed resources.
  Resources allocation;
  foreachkey (const std::string& role, roles) {
    allocation += roleSorter->allocation(role, slaveId).revocable();
  }

  // Update the available resources.

  // First remove the old oversubscribed resources from available.
  slaves[slaveId].available -= slaves[slaveId].available.revocable();

  // Now add the new estimate of available oversubscribed resources.
  slaves[slaveId].available += oversubscribed - allocation;

  LOG(INFO) << "Slave " << slaveId << " (" << slaves[slaveId].hostname
            << ") updated with oversubscribed resources " << oversubscribed
            << " (total: " << slaves[slaveId].total
            << ", available: " << slaves[slaveId].available << ")";

  allocate(slaveId);
}

template <class RoleSorter, class FrameworkSorter>
void
HierarchicalAllocatorProcess<RoleSorter, FrameworkSorter>::addSlave(
    const SlaveID& slaveId,
    const SlaveInfo& slaveInfo,
    const Resources& total,
    const hashmap<FrameworkID, Resources>& used)
{
  CHECK(initialized);
  CHECK(!slaves.contains(slaveId));

  roleSorter->add(slaveId, total.unreserved());

  foreachpair (const FrameworkID& frameworkId,
               const Resources& allocated,
               used) {
    if (frameworks.contains(frameworkId)) {
      const std::string& role = frameworks[frameworkId].role;

      // TODO(bmahler): Validate that the reserved resources have the
      // framework's role.

      roleSorter->allocated(role, slaveId, allocated.unreserved());
      frameworkSorters[role]->add(slaveId, allocated);
      frameworkSorters[role]->allocated(
          frameworkId.value(), slaveId, allocated);
    }
  }
  ...
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to