austince edited a comment on pull request #15497:
URL: https://github.com/apache/flink/pull/15497#issuecomment-815329177


   > Unless I'm missing anything, then here is an example where this happens:
   > 
   > ```
   > P1= 80 => MP1=128
   > P2=100 => MP2=256
   > ```
   > 
   > So similarly to option 2, with option 1 we still have this inconsistency 
that can very well break existing jobs when migrated to the adaptive scheduler, 
_or at some point in the future after migration_. The only way to prevent that 
is option 3, or, option 4: outright reject jobs that have not explicitly set 
the max parallelism.
   
   That is possible, I just created a test case that proves it. 😞 
   
   So, I think option 4 would be the simplest to get in, and not a difficult 
constraint to communicate to users because a) Adaptive scheduler + Reactive 
Mode are new and "experimental" features, b) setting max parallelism on all 
operators is already documented as a best practice for production jobs, and c) 
there is a solid solution that can immediately be queued up for the next 
release (reading savepoints before creating the graph). I guess something 
@tillrohrmann + @knaufk (original ticket author, FLINK-21844) should weigh in 
on?
   
   I think option 3 would be a temporary solution and would get tricky, as 
there is no communication between the scheduler and the state restore at the 
moment, and there are quite a few layers in between. Unless I misunderstand the 
necessary updates for that option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to