Here we go: https://issues.apache.org/jira/browse/MESOS-1554. Though I didn't know how to name an epic, so set ticket type as "Story".
On Fri, Jun 27, 2014 at 1:22 AM, Benjamin Hindman <[email protected]> wrote: > Wanted to jump in here and provide some context on 'persistent resources'. > As Vinod mentioned, this is how we're thinking about enabling storage-like > frameworks on Mesos. > > The idea originally came about because, even today, if we allocate some > file system space to a task/executor, and then that task/executor > terminates, we haven't officially "freed" those file system resources until > after we garbage collect the task/executor sandbox! (We keep the sandbox > around so a user/operator can get the stdout/stderr or anything else left > around from their task/executor.) > > To solve this problem we wanted to be able to let a task/executor terminate > but not *give up* all of it's resources, hence: persistent resources. > > Pushing this concept even further you could imagine always reallocating > resources to a framework that had already been allocated those resources > for a previous task/executor. Looked at from another perspective, these are > "late-binding", or "lazy", resource reservations. > > At one point in time we had considered just doing 'right-of-first-refusal' > for allocations after a task/executor terminate. But this is really > insufficient for supporting storage-like frameworks well (and likely even > harder to reliably implement then 'persistent resources' IMHO). > > There are a ton of things that need to get worked out in this model, > including (but not limited to), how should a file system (or disk) be > exposed in order to be made persistent? How should persistent resources be > returned to a master? How many persistent resources can a framework get > allocated? > > The right place to capture this all is in an "Epic" ticket on JIRA. Nikita, > do you want to create a ticket? If not, no worries, I'm happy to create the > ticket. Really looking forward to seeing this develop! > > Ben. > > > > > On Thu, Jun 26, 2014 at 11:33 AM, Vinod Kone <[email protected]> wrote: > > > SGTM. Feel free to create the ticket! > > > > > > On Thu, Jun 26, 2014 at 11:20 AM, Vetoshkin Nikita < > > [email protected]> wrote: > > > > > Thanks, Vinod! I really like the "persistent resources" idea. Maybe > there > > > should be a ticket for discussion and brainstorming? > > > On Jun 26, 2014 11:06 PM, "Vinod Kone" <[email protected]> wrote: > > > > > > > As Maxime mentioned, the long term solution is for Mesos to support > the > > > > notion of "persistent resources" i.e., resources that stay (and > > accounted > > > > for) after the life cycle of task/executor. The idea still needs > > fleshing > > > > out. > > > > > > > > > > > > On Thu, Jun 26, 2014 at 8:23 AM, Vetoshkin Nikita < > > > > [email protected]> wrote: > > > > > > > > > What about long term solution? Any ideas? Twitter's Manhattan > > database > > > > > claims to use Mesos for scaling up and down. Can you shed some > light > > > how > > > > do > > > > > they deal with the situation like this? > > > > > On Jun 26, 2014 5:01 AM, "Vinod Kone" <[email protected]> wrote: > > > > > > > > > > > Thanks for listing this out Adam. > > > > > > > > > > > > Data Residency: > > > > > > > - Should we destroy the sandbox/hdfs-data when shutting down a > > DN? > > > > > > > - If starting DN on node that was previously running a DN, > > > can/should > > > > > we > > > > > > > try to revive the existing data? > > > > > > > > > > > > > > > > > > > I think this is one of the key challenges for a production > quality > > > HDFS > > > > > on > > > > > > Mesos. Currently, since sandbox is deleted after a task exits, if > > all > > > > the > > > > > > data nodes that hold a block (and its replicas) get lost/killed > for > > > > > > whatever reason there would be data loss. A short terms solution > > > would > > > > be > > > > > > to write outside sandbox and use slave attributes to track where > to > > > > > > re-launch data node tasks. > > > > > > > > > > > > > > > > > > > > >
