Till Rohrmann created FLINK-4154:
------------------------------------

             Summary: Correction of murmur hash breaks backwards compatibility
                 Key: FLINK-4154
                 URL: https://issues.apache.org/jira/browse/FLINK-4154
             Project: Flink
          Issue Type: Bug
          Components: Distributed Coordination
    Affects Versions: 1.1.0
            Reporter: Till Rohrmann
            Priority: Critical
             Fix For: 1.1.0


The correction of Flink's murmur hash with commit [1], breaks Flink's backwards 
compatibility with respect to savepoints. The reason is that the changed murmur 
hash which is used to partition elements in a {{KeyedStream}} changes the 
mapping from keys to sub tasks. This changes the assigned key spaces for a sub 
task. Consequently, an old savepoint (version 1.0) assigns states with a 
different key space to the sub tasks.

I think that this must be fixed for the upcoming 1.1 release. I see two options 
to solve the problem:

-  revert the changes, but then we don't know how the flawed murmur hash 
performs
- develop tooling to repartition state of old savepoints. This is probably not 
trivial since a keyed stream can also contain non-partitioned state which is 
not partitionable in all cases. And even if only partitioned state is used, we 
would need some kind of special operator which can repartition the state wrt 
the key.

I think that the latter option requires some more thoughts and is thus unlikely 
to be done before the release 1.1. Therefore, as a workaround, I think that we 
should revert the murmur hash changes.

[1] 
https://github.com/apache/flink/commit/641a0d436c9b7a34ff33ceb370cf29962cac4dee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to