Stefan Richter created FLINK-11141:
--------------------------------------

             Summary: Key generation for RocksDBMapState can theoretically be 
ambiguous
                 Key: FLINK-11141
                 URL: https://issues.apache.org/jira/browse/FLINK-11141
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
    Affects Versions: 1.7.0, 1.6.2, 1.5.5
            Reporter: Stefan Richter


RocksDBMap state stores values in RocksDB under a composite key from the 
serialized bytes of {{key-group-id|key|namespace|user-key}}. In this 
composition, key, namespace, and user-key can either have fixed sized or 
variable sized serialization formats. In cases of at least 2 variable formats, 
ambiguity can be possible, e.g.:

abcd <-> efg
abc <-> defg

Our code takes care of this for all other states, where composite keys only 
consist of key and namespace by checking for 2x variable size and appending the 
serialized length to each byte sequence.

However, for map state there is no inclusion of the user-key in the check for 
potential ambiguity, as well as for appending the size. This means that, in 
theory, some combinations can produce colliding composite keys in RocksDB. What 
is required is to include the user-key serializer in the check and append the 
length there as well.

Please notice that this cannot be simply changed because it has implications 
for backwards compatibility and requires some form of migration for the state 
keys on restore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to