Greg Fodor created KAFKA-4120:
---------------------------------

             Summary: byte[] keys in RocksDB state stores do not work as 
expected
                 Key: KAFKA-4120
                 URL: https://issues.apache.org/jira/browse/KAFKA-4120
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 0.10.0.1
            Reporter: Greg Fodor
            Assignee: Guozhang Wang


We ran into an issue using a byte[] key in a RocksDB state store (with the byte 
array serde.) Internally, the RocksDB store keeps a LRUCache that is backed by 
a LinkedHashMap that sits between the callers and the actual db. The problem is 
that while the underlying rocks db will persist byte arrays with equal data as 
equivalent keys, the LinkedHashMap uses byte[] reference equality from 
Object.equals/hashcode. So, this can result in multiple entries in the cache 
for two different byte arrays that have the same contents and are backed by the 
same key in the db, resulting in unexpected behavior. 

One such behavior that manifests from this is if you store a value in the state 
store with a specific key, if you re-read that key with the same byte array you 
will get the new value, but if you re-read that key with a different byte array 
with the same bytes, you will get a stale value until the db is flushed. (This 
made it particularly tricky to track down what was happening :))

The workaround for us is to convert the keys from raw byte arrays to a 
deserialized avro structure that provides proper hashcode/equals semantics for 
the intermediate cache. In general this seems like good practice, so one of the 
proposed solutions is to simply emit a warning or exception if a key type with 
breaking semantics like this is provided.

A few proposed solutions:

- When the state store is defined on array keys, ensure that the cache map does 
proper comparisons on array values not array references. This would fix this 
problem, but seems a bit strange to special case. However, I have a hard time 
of thinking of other examples where this behavior would burn users.

- Change the LRU cache to deserialize and serialize all keys to bytes and use a 
value based comparison for the map. This would be the most correct, as it would 
ensure that both the rocks db and the cache have identical key spaces and 
equality/hashing semantics. However, this is probably slow, and since the 
general case of using avro record types as keys works fine, it will largely be 
unnecessary overhead.

- Don't change anything about the behavior, but trigger a warning in the log or 
fail to start if a state store is defined on array keys (or possibly any key 
type that fails to properly override Object.equals/hashcode.)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to