Re: Flink's multi-user support

Maximilian Michels Wed, 13 May 2015 03:16:11 -0700

Yes, should be possible to implement both independently.

On Wed, May 13, 2015 at 11:41 AM, Stephan Ewen <se...@apache.org> wrote:


> On first thought, the sessions and the multi-job vs. job queue question are
> almost two separate issues.
>
> Can you add the sessions without removing the concurrent jobs we currently
> have?
>
> On Wed, May 13, 2015 at 10:34 AM, Maximilian Michels <m...@apache.org>
> wrote:
>
> > I think we can agree that real multi-user support in Flink (standalone)
> is
> > neither desirable, because there are already sophisticated solutions out
> > there (YARN or Mesos), nor feasible because it is a lot of work to get it
> > right.
> >
> > At the current state of affairs, resource sharing between two users
> > submitting a job at the same time, is not properly handled. However, this
> > discussion showed that it is desirable to have support for submitting
> > multiple job to a single Flink cluster. This could be realized using a
> > simple queuing system in which jobs are executed one after another.
> >
> > In case of the soon to be supported resuming of jobs from intermediate
> > results, this should still enable multiple clients to refer to past jobs.
> > The job manager simply holds a list of old ExecutionGraphs for each user
> > session. When the user ends the session or a timeout occurs, the
> > corresponding graph is archived. This poses some sort of session
> > management.
> >
> > tl;dr I propose to drop the multi-user support that we have now. Instead,
> > let's have a one-job-at-a-time usage model with a queuing system and
> > eventually a session management to deal with resuming from already
> > materialized results.
> >
> > What do you think?
> >
> > On Thu, Apr 30, 2015 at 11:09 AM, Flavio Pompermaier <
> pomperma...@okkam.it
> > >
> > wrote:
> >
> > > There was an attempt to build such a queue during the Dopa project when
> > > Flink was still Stratosphere.
> > > Probably it could be a good idea to collect the good and bad things
> > learned
> > > from it to start designing the new scheduler :)
> > >
> > > On Thu, Apr 30, 2015 at 10:08 AM, Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > > > Most components are written multi-job aware.
> > > >
> > > > The only thing that is not in there right now is scheduling policies
> > for
> > > > fair resource sharing. This is important in shared clusters.
> > > >
> > > > Since YARN implements all those things (various job queues with
> > different
> > > > priorities/policies etc), I suggest to not try and re-build it in
> Flink
> > > and
> > > > simply declare a JobManager a "single-job-at-a-time" manager. You can
> > > still
> > > > run an interactive session with many jobs one after another.
> > > >
> > > >
> > > > On Wed, Apr 29, 2015 at 7:07 PM, Maximilian Michels <m...@apache.org>
> > > > wrote:
> > > >
> > > > > >
> > > > > > However, dropping it completely instead of improving it would
> make
> > > > Flink
> > > > > > setups on dedicated clusters quite useless, right?
> > > > > >
> > > > >
> > > > > Not really, because you could also use YARN on dedicated clusters
> for
> > > > > proper multi-user support.
> > > > >
> > > > > On Wed, Apr 29, 2015 at 5:51 PM, Fabian Hueske <fhue...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > I agree that Flink's multi-user support is not very good at the
> > > moment.
> > > > > > However, dropping it completely instead of improving it would
> make
> > > > Flink
> > > > > > setups on dedicated clusters quite useless, right?
> > > > > >
> > > > > >
> > > > > > 2015-04-29 17:33 GMT+02:00 Maximilian Michels <m...@apache.org>:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > Currently Flink accepts jobs from multiple clients and executes
> > > them
> > > > > > > concurrently if the resource limits are not exceeded. However,
> > the
> > > > > > > multi-user support is very poor. We don't support queuing of
> jobs
> > > and
> > > > > > > concurrent jobs have to share resources in a nice way.
> Otherwise,
> > > > jobs
> > > > > > will
> > > > > > > fail.
> > > > > > >
> > > > > > > Using YARN, we circumvent these problems because it provides a
> > > proper
> > > > > > user
> > > > > > > and session management. I'm wondering now, should we get rid of
> > the
> > > > > > pseudo
> > > > > > > multi-user mode and just support one user per Flink cluster
> > > instance?
> > > > > > >
> > > > > > > Best,
> > > > > > > Max
> > > > > > >
> > > > > > > PS:
> > > > > > > This question came up when I was working on a pull request to
> > > support
> > > > > > > backtracking intermediate results. I need to hold a copy of the
> > > full
> > > > > > > previous execution graph to resume from old results. With
> > multiple
> > > > > users,
> > > > > > > we have to build in some kind of session management to archive
> > old
> > > > > > > execution graphs. Otherwise, they will consume too much memory
> in
> > > the
> > > > > job
> > > > > > > manager.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Flink's multi-user support

Reply via email to