Myasuka commented on issue #7009: [FLINK-10712] Support to restore state when 
using RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/7009#issuecomment-455623323
 
 
   @StefanRRichter I have went through all production code using union state as 
below:
   
   For `ArtificalOperatorStateMapper`, `SimpleEndlessSourceWithBloatedState `, 
and `StateCreatingFlatMap`, they just use union state for end-to-end test 
verifying. 
   
   For `FlinkKafkaProducer` and `FlinkKafkaProducer011`, the union 
`nextTransactionalIdHintState` is just the same for all sub-tasks.
   
   For `KafkaConsumer`, since Kafaka does support to decrease partition counts, 
the kafka partition is sticky to the subtask  when parallslism not changed.
   
   I think above operators would not meet conflict case.
   
   For `SequenceGeneratorSource`, it would initialize state with max event 
time. However, `monotonousEventTime` only increase by fixed step of 
`eventTimeClockProgressPerEvent`, which should be the same for all sub-tasks.
   
   For `StreamingFileSink`, it also get the max counter, but from the 
defination of `StreamingFileSink`, when restoring from checkpoint, the restored 
files in `pending` state are transferred into `finished` state, while 
`in-progress` files are rolled back. In other words from my point, partial 
recovery is not suitable for `StreamingFileSink`.
   
   From my point of view, all operators in production code with union state, 
except `StreamingFileSink`, should work fine for partial restore. However, 
since the `getUnionList` API is public for users, we cannot control users' 
behavior. In a nutshell, if we support to restore state when using 
`RestartPipelinedRegionStrategy`, we should add limitation for union state.
   
   I plan to add another parameter, which might be `RecoveryMode`, in the 
`restoreLatestCheckpointedState` method. For `RestartAllStrategy` and overall 
only one region's `RestartPipelinedRegionStrategy`, it's `RecoveryMode.ALL`; 
for other failover strategies, it's `RecoveryMode.PARTIAL`. By means of this, 
when assigning state, if we found `RecoveryMode.PARTIAL` and union state 
existed, unsupported exception could be thrown out. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to