Github user tillrohrmann commented on the issue:
https://github.com/apache/flink/pull/2194
Really good work @uce. The code is well structured and thoroughly tested. I
had only some minor comments.
While testing the code with the streaming state machine job I stumbled
across a problem, though. Recovering from a Flink 1.0 savepoint does not work
if the job contains a `keyBy` operation. The reason is that we had a faulty
murmur hash implementation in Flink 1.0 and due to its correction, the mapping
of keys to sub tasks has changed. Consequently, the restored state no longer
matches the assigned key spaces for each operator. This is the problematic
[commit](https://github.com/apache/flink/commit/641a0d436c9b7a34ff33ceb370cf29962cac4dee).
Thus, this change is actually breaking our backwards compatibility with
respect to savepoints. In order to solve the problem I see three possibilities:
- Revert the changes of this commit. But we don't know how the flawed
murmur hash performs.
- Develop a tool which can repartition savepoints
- Don't support backwards compatibility between version 1.0 and 1.1
I think that option 3 is not doable given our backwards compatibility
promise. Furthermore, option 2 is not really straight forward, if the user has
a keyed stream where he uses the `Checkpointed` interface. Given that the
release is upcoming, I think option 1 would be the best way to solve the
problem.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---