YARN does not have that problem anyways, because YARN sets the default parallelism to all slots anyways
On Thu, Mar 12, 2015 at 11:19 AM, Maximilian Michels <m...@apache.org> wrote: > +1 for unifying the way to set the parallelism and deprecating the old > methods. > > We had the AUTOMAX discussion before in the corresponding pull > request. It seems to be that there are two orthogonal views on how > resources should be allocated by default. I strongly agree with > Robert. > > Users have exclusive access to resources or use a resource manager > (YARN). They are often unaware of the parallelism and are turned off > by the bad performance with parallelism of 1. Setting AUTOMAX by > default gives the best possible Flink experience. After all, Flink > doesn't even support proper sharing of resources at the moment. So > scenarios where multiple users manually set the parallelism will cause > problems with job canceling due to unavailable resources and missing > queuing features. > > Let's leave it up to the advanced users to set the granularity of the > parallelism and provide the best out of the box experience for Flink > novices. > > Best regards, > Max > > On Thu, Mar 12, 2015 at 10:31 AM, Robert Metzger <rmetz...@apache.org> > wrote: > > We can also make the change non-API breaking by adding an additional > method > > and deprecating the old one. > > > > > > Why would the AUTOMAX parallelism eat up all cluster resources? It would > > only allocate all slots WITHIN the Flink cluster. > > Those users (=new users) who would benefit from the AUTOMAX parallelism > > have probably set the parallelism per TaskManager set to 1 anyways. > > Advanced users will set their parallelism / slots configuration anyways > > properly. > > > > In my experience, most users: > > - have exclusive access to a test cluster in the beginning (I don't think > > anybody who doesn't know the system at all would start Flink on a > > production cluster) > > - or use YARN > > - do not set any parallelism for jobs or slots per TaskManager. > > > > From these observations, I would actually set the number of slots on the > > TaskManagers to the number of available CPUs. > > And for the CLI frontend, I would by default let a job use all available > > slots (most users don't know that Flink allows to run multiple jobs at > the > > same time). > > > > If users want to change the behavior, they have to look into the > > documentation. > > > > On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske <fhue...@gmail.com> > wrote: > > > >> +1 for going consistently with parallelism. However, these are > API-breaking > >> changes and we need to mark them deprecated before throwing them out, > IMO. > >> > >> I am not comfortable with using AUTOMAX as a default. This is fine on > >> dedicated setups like YARN sessions, but will consume all available > >> resources of a cluster if a user forgets to set the -p flag (or fix the > DOP > >> in the program). There is already a default-parallelsm flag in the > config > >> and that value should be used, IMO. > >> > >> 2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) <j...@apache.org>: > >> > >> > > >> > [ > >> > > >> > https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358345#comment-14358345 > >> > ] > >> > > >> > Robert Metzger commented on FLINK-1679: > >> > --------------------------------------- > >> > > >> > I would suggest to remove all occurrences of "degreeOfParalleism" in > the > >> > system and replace it by "parallelism" everywhere. > >> > The CLI frontend for example also calls it {{-p}}, not {{-dop}}. > >> > > >> > I would also suggest to set the parallelism by default to {{AUTOMAX}} > in > >> > the CliFrontend. > >> > > >> > > Document how "degree of parallelism" / "parallelism" / "slots" are > >> > connected to each other > >> > > > >> > > >> > ------------------------------------------------------------------------------------------- > >> > > > >> > > Key: FLINK-1679 > >> > > URL: > https://issues.apache.org/jira/browse/FLINK-1679 > >> > > Project: Flink > >> > > Issue Type: Task > >> > > Components: Documentation > >> > > Affects Versions: 0.9 > >> > > Reporter: Robert Metzger > >> > > Assignee: Ufuk Celebi > >> > > > >> > > I see too many users being confused about properly setting up Flink > >> with > >> > respect to parallelism. > >> > > >> > > >> > > >> > -- > >> > This message was sent by Atlassian JIRA > >> > (v6.3.4#6332) > >> > > >> >