Stefan Richter created FLINK-11141:
--------------------------------------
Summary: Key generation for RocksDBMapState can theoretically be
ambiguous
Key: FLINK-11141
URL: https://issues.apache.org/jira/browse/FLINK-11141
Project: Flink
Issue Type: Bug
Components: State Backends, Checkpointing
Affects Versions: 1.7.0, 1.6.2, 1.5.5
Reporter: Stefan Richter
RocksDBMap state stores values in RocksDB under a composite key from the
serialized bytes of {{key-group-id|key|namespace|user-key}}. In this
composition, key, namespace, and user-key can either have fixed sized or
variable sized serialization formats. In cases of at least 2 variable formats,
ambiguity can be possible, e.g.:
abcd <-> efg
abc <-> defg
Our code takes care of this for all other states, where composite keys only
consist of key and namespace by checking for 2x variable size and appending the
serialized length to each byte sequence.
However, for map state there is no inclusion of the user-key in the check for
potential ambiguity, as well as for appending the size. This means that, in
theory, some combinations can produce colliding composite keys in RocksDB. What
is required is to include the user-key serializer in the check and append the
length there as well.
Please notice that this cannot be simply changed because it has implications
for backwards compatibility and requires some form of migration for the state
keys on restore.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)