austince edited a comment on pull request #15497: URL: https://github.com/apache/flink/pull/15497#issuecomment-815329177
> Unless I'm missing anything, then here is an example where this happens: > > ``` > P1= 80 => MP1=128 > P2=100 => MP2=256 > ``` > > So similarly to option 2, with option 1 we still have this inconsistency that can very well break existing jobs when migrated to the adaptive scheduler, _or at some point in the future after migration_. The only way to prevent that is option 3, or, option 4: outright reject jobs that have not explicitly set the max parallelism. That is possible, I just created a test case that proves it. 😞 So, I think option 4 (require max parallelism to be set) would be the simplest to get in, and not a difficult constraint to communicate to users because a) Adaptive scheduler + Reactive Mode are new and "experimental" features, b) setting max parallelism on all operators is already documented as a best practice for production jobs, and c) there is a solid solution that can immediately be queued up for the next release (reading savepoints before creating the graph). I guess something @tillrohrmann + @knaufk (original ticket author, FLINK-21844) should weigh in on? I think option 3 would be a temporary solution and would get tricky, as there is no communication between the scheduler and the state restore at the moment, and there are quite a few layers in between. Unless I misunderstand the necessary updates for that option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
