[ 
https://issues.apache.org/jira/browse/FLINK-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362673#comment-15362673
 ] 

Greg Hogan commented on FLINK-4154:
-----------------------------------

This is a keen observation. Considering the usage and looking at the algorithm 
the right shifts will pull entropy into the lower bits before we take the 
modulus for the channel number. This might be something we note in the code and 
keep forever unless we have other breaking changes in saved state for 2.0.

> Correction of murmur hash breaks backwards compatibility
> --------------------------------------------------------
>
>                 Key: FLINK-4154
>                 URL: https://issues.apache.org/jira/browse/FLINK-4154
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.1.0
>            Reporter: Till Rohrmann
>            Priority: Blocker
>             Fix For: 1.1.0
>
>
> The correction of Flink's murmur hash with commit [1], breaks Flink's 
> backwards compatibility with respect to savepoints. The reason is that the 
> changed murmur hash which is used to partition elements in a {{KeyedStream}} 
> changes the mapping from keys to sub tasks. This changes the assigned key 
> spaces for a sub task. Consequently, an old savepoint (version 1.0) assigns 
> states with a different key space to the sub tasks.
> I think that this must be fixed for the upcoming 1.1 release. I see two 
> options to solve the problem:
> -  revert the changes, but then we don't know how the flawed murmur hash 
> performs
> - develop tooling to repartition state of old savepoints. This is probably 
> not trivial since a keyed stream can also contain non-partitioned state which 
> is not partitionable in all cases. And even if only partitioned state is 
> used, we would need some kind of special operator which can repartition the 
> state wrt the key.
> I think that the latter option requires some more thoughts and is thus 
> unlikely to be done before the release 1.1. Therefore, as a workaround, I 
> think that we should revert the murmur hash changes.
> [1] 
> https://github.com/apache/flink/commit/641a0d436c9b7a34ff33ceb370cf29962cac4dee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to