Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/2194
  
    Really good work @uce. The code is well structured and thoroughly tested. I 
had only some minor comments.
    
    While testing the code with the streaming state machine job I stumbled 
across a problem, though. Recovering from a Flink 1.0 savepoint does not work 
if the job contains a `keyBy` operation. The reason is that we had a faulty 
murmur hash implementation in Flink 1.0 and due to its correction, the mapping 
of keys to sub tasks has changed. Consequently, the restored state no longer 
matches the assigned key spaces for each operator. This is the problematic 
[commit](https://github.com/apache/flink/commit/641a0d436c9b7a34ff33ceb370cf29962cac4dee).
    
    Thus, this change is actually breaking our backwards compatibility with 
respect to savepoints. In order to solve the problem I see three possibilities:
    - Revert the changes of this commit. But we don't know how the flawed 
murmur hash performs.
    - Develop a tool which can repartition savepoints
    - Don't support backwards compatibility between version 1.0 and 1.1
    
    I think that option 3 is not doable given our backwards compatibility 
promise. Furthermore, option 2 is not really straight forward, if the user has 
a keyed stream where he uses the `Checkpointed` interface. Given that the 
release is upcoming, I think option 1 would be the best way to solve the 
problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to