[
https://issues.apache.org/jira/browse/MESOS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274311#comment-14274311
]
Dobromir Montauk commented on MESOS-1554:
-----------------------------------------
Separating "resources" from the "running job" makes a lot of sense. That's how
Borg at Google works.
They have a separate concept, "allocation", that you can use (but don't have
to). You define an allocation just like a task/job - how much CPU, RAM, etc it
gets. Then you can put your tasks "into" the allocation. They have their own
CPU, RAM, etc requirements and obviously have to fit.
Borg then has separate commands for allocations and jobs. If you just touch the
job (up/down/restart/etc), then the allocation sticks around, and can be
reused. All disk resources, CPU reservation, etc is still there. Note that
allocations *must* support more than just "persistent disk" - otherwise,
there's a chance that the job won't schedule because CPU/RAM is used by someone
else, and you've just lost all your "persistence" benefits! To wipe away the
job entirely you have to remove the allocation itself (which, being very
dangerous, was usually secured with a different permission set than the job).
It looks like the design right now is mostly around "persistent disk" but I'm
not sure that's really going to work longer-term. We should make "allocations"
first-class objects that, like tasks, can reserve anything, and have jobs just
running inside an alloc.
> Persistent resources support for storage-like services
> ------------------------------------------------------
>
> Key: MESOS-1554
> URL: https://issues.apache.org/jira/browse/MESOS-1554
> Project: Mesos
> Issue Type: Epic
> Components: general, hadoop
> Reporter: Nikita Vetoshkin
> Priority: Minor
> Labels: twitter
>
> This question came up in [dev mailing
> list|http://mail-archives.apache.org/mod_mbox/mesos-dev/201406.mbox/%3CCAK8jAgNDs9Fe011Sq1jeNr0h%3DE-tDD9rak6hAsap3PqHx1y%3DKQ%40mail.gmail.com%3E].
> It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use
> Mesos to manage it's instances. But right now if we'd like to restart
> instance (e.g. to spin up a new version) - all previous instance version
> sandbox filesystem resources will be recycled by slave's garbage collector.
> At the moment filesystem resources can be managed out of band - i.e.
> instances can save their data in some database specific placed, that various
> instances can share (e.g. {{/var/lib/cassandra}}).
> [~benjaminhindman] suggested an idea in the mailing list (though it still
> needs some fleshing out):
> {quote}
> The idea originally came about because, even today, if we allocate some
> file system space to a task/executor, and then that task/executor
> terminates, we haven't officially "freed" those file system resources until
> after we garbage collect the task/executor sandbox! (We keep the sandbox
> around so a user/operator can get the stdout/stderr or anything else left
> around from their task/executor.)
> To solve this problem we wanted to be able to let a task/executor terminate
> but not *give up* all of it's resources, hence: persistent resources.
> Pushing this concept even further you could imagine always reallocating
> resources to a framework that had already been allocated those resources
> for a previous task/executor. Looked at from another perspective, these are
> "late-binding", or "lazy", resource reservations.
> At one point in time we had considered just doing 'right-of-first-refusal'
> for allocations after a task/executor terminate. But this is really
> insufficient for supporting storage-like frameworks well (and likely even
> harder to reliably implement then 'persistent resources' IMHO).
> There are a ton of things that need to get worked out in this model,
> including (but not limited to), how should a file system (or disk) be
> exposed in order to be made persistent? How should persistent resources be
> returned to a master? How many persistent resources can a framework get
> allocated?
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)