Re: HDFS on Mesos

Vetoshkin Nikita Sun, 29 Jun 2014 10:08:55 -0700

Here we go: https://issues.apache.org/jira/browse/MESOS-1554. Though I
didn't know how to name an epic, so set ticket type as "Story".



On Fri, Jun 27, 2014 at 1:22 AM, Benjamin Hindman <[email protected]>
wrote:

> Wanted to jump in here and provide some context on 'persistent resources'.
> As Vinod mentioned, this is how we're thinking about enabling storage-like
> frameworks on Mesos.
>
> The idea originally came about because, even today, if we allocate some
> file system space to a task/executor, and then that task/executor
> terminates, we haven't officially "freed" those file system resources until
> after we garbage collect the task/executor sandbox! (We keep the sandbox
> around so a user/operator can get the stdout/stderr or anything else left
> around from their task/executor.)
>
> To solve this problem we wanted to be able to let a task/executor terminate
> but not *give up* all of it's resources, hence: persistent resources.
>
> Pushing this concept even further you could imagine always reallocating
> resources to a framework that had already been allocated those resources
> for a previous task/executor. Looked at from another perspective, these are
> "late-binding", or "lazy", resource reservations.
>
> At one point in time we had considered just doing 'right-of-first-refusal'
> for allocations after a task/executor terminate. But this is really
> insufficient for supporting storage-like frameworks well (and likely even
> harder to reliably implement then 'persistent resources' IMHO).
>
> There are a ton of things that need to get worked out in this model,
> including (but not limited to), how should a file system (or disk) be
> exposed in order to be made persistent? How should persistent resources be
> returned to a master? How many persistent resources can a framework get
> allocated?
>
> The right place to capture this all is in an "Epic" ticket on JIRA. Nikita,
> do you want to create a ticket? If not, no worries, I'm happy to create the
> ticket. Really looking forward to seeing this develop!
>
> Ben.
>
>
>
>
> On Thu, Jun 26, 2014 at 11:33 AM, Vinod Kone <[email protected]> wrote:
>
> > SGTM. Feel free to create the ticket!
> >
> >
> > On Thu, Jun 26, 2014 at 11:20 AM, Vetoshkin Nikita <
> > [email protected]> wrote:
> >
> > > Thanks, Vinod! I really like the "persistent resources" idea. Maybe
> there
> > > should be a ticket for discussion and brainstorming?
> > > On Jun 26, 2014 11:06 PM, "Vinod Kone" <[email protected]> wrote:
> > >
> > > > As Maxime mentioned, the long term solution is for Mesos to support
> the
> > > > notion of "persistent resources" i.e., resources that stay (and
> > accounted
> > > > for) after the life cycle of task/executor. The idea still needs
> > fleshing
> > > > out.
> > > >
> > > >
> > > > On Thu, Jun 26, 2014 at 8:23 AM, Vetoshkin Nikita <
> > > > [email protected]> wrote:
> > > >
> > > > > What about long term solution? Any ideas? Twitter's Manhattan
> > database
> > > > > claims to use Mesos for scaling up and down. Can you shed some
> light
> > > how
> > > > do
> > > > > they deal with the situation like this?
> > > > > On Jun 26, 2014 5:01 AM, "Vinod Kone" <[email protected]> wrote:
> > > > >
> > > > > > Thanks for listing this out Adam.
> > > > > >
> > > > > > Data Residency:
> > > > > > > - Should we destroy the sandbox/hdfs-data when shutting down a
> > DN?
> > > > > > > - If starting DN on node that was previously running a DN,
> > > can/should
> > > > > we
> > > > > > > try to revive the existing data?
> > > > > > >
> > > > > >
> > > > > > I think this is one of the key challenges for a production
> quality
> > > HDFS
> > > > > on
> > > > > > Mesos. Currently, since sandbox is deleted after a task exits, if
> > all
> > > > the
> > > > > > data nodes that hold a block (and its replicas) get lost/killed
> for
> > > > > > whatever reason there would be data loss. A short terms solution
> > > would
> > > > be
> > > > > > to write outside sandbox and use slave attributes to track where
> to
> > > > > > re-launch data node tasks.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HDFS on Mesos

Reply via email to