YARN does not have that problem anyways, because YARN sets the default
parallelism to all slots anyways


On Thu, Mar 12, 2015 at 11:19 AM, Maximilian Michels <m...@apache.org> wrote:

> +1 for unifying the way to set the parallelism and deprecating the old
> methods.
>
> We had the AUTOMAX discussion before in the corresponding pull
> request. It seems to be that there are two orthogonal views on how
> resources should be allocated by default. I strongly agree with
> Robert.
>
> Users have exclusive access to resources or use a resource manager
> (YARN). They are often unaware of the parallelism and are turned off
> by the bad performance with parallelism of 1. Setting AUTOMAX by
> default gives the best possible Flink experience. After all, Flink
> doesn't even support proper sharing of resources at the moment. So
> scenarios where multiple users manually set the parallelism will cause
> problems with job canceling due to unavailable resources and missing
> queuing features.
>
> Let's leave it up to the advanced users to set the granularity of the
> parallelism and provide the best out of the box experience for Flink
> novices.
>
> Best regards,
> Max
>
> On Thu, Mar 12, 2015 at 10:31 AM, Robert Metzger <rmetz...@apache.org>
> wrote:
> > We can also make the change non-API breaking by adding an additional
> method
> > and deprecating the old one.
> >
> >
> > Why would the AUTOMAX parallelism eat up all cluster resources? It would
> > only allocate all slots WITHIN the Flink cluster.
> > Those users (=new users) who would benefit from the AUTOMAX parallelism
> > have probably set the parallelism per TaskManager set to 1 anyways.
> > Advanced users will set their parallelism / slots configuration anyways
> > properly.
> >
> > In my experience, most users:
> > - have exclusive access to a test cluster in the beginning (I don't think
> > anybody who doesn't know the system at all would start Flink on a
> > production cluster)
> > - or use YARN
> > - do not set any parallelism for jobs or slots per TaskManager.
> >
> > From these observations, I would actually set the number of slots on the
> > TaskManagers to the number of available CPUs.
> > And for the CLI frontend, I would by default let a job use all available
> > slots (most users don't know that Flink allows to run multiple jobs at
> the
> > same time).
> >
> > If users want to change the behavior, they have to look into the
> > documentation.
> >
> > On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske <fhue...@gmail.com>
> wrote:
> >
> >> +1 for going consistently with parallelism. However, these are
> API-breaking
> >> changes and we need to mark them deprecated before throwing them out,
> IMO.
> >>
> >> I am not comfortable with using AUTOMAX as a default. This is fine on
> >> dedicated setups like YARN sessions, but will consume all available
> >> resources of a cluster if a user forgets to set the -p flag (or fix the
> DOP
> >> in the program). There is already a default-parallelsm flag in the
> config
> >> and that value should be used, IMO.
> >>
> >> 2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) <j...@apache.org>:
> >>
> >> >
> >> >     [
> >> >
> >>
> https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358345#comment-14358345
> >> > ]
> >> >
> >> > Robert Metzger commented on FLINK-1679:
> >> > ---------------------------------------
> >> >
> >> > I would suggest to remove all occurrences of "degreeOfParalleism" in
> the
> >> > system and replace it by "parallelism" everywhere.
> >> > The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
> >> >
> >> > I would also suggest to set the parallelism by default to {{AUTOMAX}}
> in
> >> > the CliFrontend.
> >> >
> >> > > Document how "degree of parallelism" /  "parallelism" / "slots" are
> >> > connected to each other
> >> > >
> >> >
> >>
> -------------------------------------------------------------------------------------------
> >> > >
> >> > >                 Key: FLINK-1679
> >> > >                 URL:
> https://issues.apache.org/jira/browse/FLINK-1679
> >> > >             Project: Flink
> >> > >          Issue Type: Task
> >> > >          Components: Documentation
> >> > >    Affects Versions: 0.9
> >> > >            Reporter: Robert Metzger
> >> > >            Assignee: Ufuk Celebi
> >> > >
> >> > > I see too many users being confused about properly setting up Flink
> >> with
> >> > respect to parallelism.
> >> >
> >> >
> >> >
> >> > --
> >> > This message was sent by Atlassian JIRA
> >> > (v6.3.4#6332)
> >> >
> >>
>

Reply via email to