austince commented on pull request #15497:
URL: https://github.com/apache/flink/pull/15497#issuecomment-815329177


   
   > Unless I'm missing anything, then here is an example where this happens:
   > 
   > ```
   > P1= 80 => MP1=128
   > P2=100 => MP2=256
   > ```
   > 
   > So similarly to option 2, with option 1 we still have this inconsistency 
that can very well break existing jobs when migrated to the adaptive scheduler, 
_or at some point in the future after migration_. The only way to prevent that 
is option 3, or, option 4: outright reject jobs that have not explicitly set 
the max parallelism.
   
   That is possible, I just created a test case that proves it. 😞 
   
   So, I think option 4 would be the simplest to get in, and not a difficult 
constraint to communicate to users because a) Adaptive scheduler + Reactive 
Mode are new and "experimental" features, b) setting max parallelism is already 
documented as a best practice for production jobs, and c) there is a solid 
solution that can immediately be queued up for the next release (reading 
savepoints before creating the graph). I guess something @tillrohrmann + 
@knaufk (original ticket author, FLINK-21844) should weigh in on?
   
   I think option 3 would be a temporary solution and would get tricky, as 
there is no good communication between the scheduler and the execution vertex 
at the moment. Unless I misunderstand the necessary updates for that option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to