gaoyunhaii edited a comment on pull request #16655: URL: https://github.com/apache/flink/pull/16655#issuecomment-893306063
Hi Dawid, very thanks for the review! > Do you mean successors or predecessors in those cases? I guess you meant predecessors, right? It should indeed be the predecessors. I'll check the name overall and change all of them to the predecessors, sorry I initially think processors also means something like predecessors. > I don't get this example. Could you try to explain it a bit more? Do you mean a checkpoint is used by a completely different graph? Is the original graph a disjoint graph? I have a hard time understanding it. Yes, in this example it is indeed a disjoint graph, to be detail: The main issue here is that we want to avoid if a part of keyed state is finished and discarded, then after failover, we would like to have no new records with these keys. Suppose we have a job composed of two separate parts: ``` source1 = env.addSource(...) map1 = source1.forward().map(xx) source2 = env.addSource(...) process2 = source2.keyBy().process(xx) ``` Suppose in the first run, `process2` is partly finished, then `source2` must be fully finished since `keyBy` here is `all to all`. Suppose also `source` and `map1` is also partly finished in the same checkpoint. After re-run users may modify the graph, There might be several situations: 1. Users add a new operator like ``` source1 = env.addSource(...) map1 = source1.forward().map(xx) source3 = env.addSource(...) process2 = source3.keyBy().process(xx) ``` Then `source3` would be ALL_RUNNING and we could check this situation. 2. users use `source1` to replace the `source2` ``` source1 = env.addSource(...) process2 = source3.keyBy().process(xx) ``` We could also check this situation since `source1` is partly finished and `keyBy` is all-to-all. 3. users use `DataStream#reinterpretAsKeyedStream()` and use `source1` to replace `source2` ``` source1 = env.addSource(...) process2 = DataStreamUtils.reinterpretAsKeyedStream(source1, keySelector).process(xx) ``` then in the new graph, `source1` is partly finished, it is connected to `process2` via `forward` edge (POINTWISE). Thus it is not easy to check this situation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
