[ https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063945#comment-16063945 ]
Guozhang Wang commented on KAFKA-4750: -------------------------------------- [~mjsax][~evis] [~mihbor] Thanks for your comments. I would like to think a bit more on the general resolution for this case though before reviewing [~evis]'s patch: 1. In Kafka messages, "null" byte arrays indicate tombstones, note that this means that if user's serde decide to serialize any objects into null for a log compacted topic (e.g. a changelog topic of a state store), it meant to delete the record from the store. 2. In Kafka Streams state stores, we did NOT enforcing if "null" indicates deletion from the javadoc: {code} /** * Update the value associated with this key * * @param key The key to associate the value to * @param value The value, it can be null. * @throws NullPointerException If null is used for key. */ void put(K key, V value); {code} However our implementation did treat value-typed "null" (note it is not "null" byte arrays as in serialized messages) as deletions, since we implement {{delete(key)}} as {{put(key, null)}}. As Evgeny / Michal mentioned, it is intuitive if our {{put}} semantics aligned with Java's map operations: {code} ... // store initialized as empty store.get(key); // returns null store.put(key, value); store.delete(key); store.get(key); // returns null store.put(key, value); store.put(key, null); // we can interpret it as "associate the key with null" or simply delete this key store.get(key); // returns null, though generally speaking it could indicate either the key is associated with value or the key does not exist {code} Now assuming you have a customized serde that maps "null" object to "not-null" byte arrays, in this case the above would still hold: {code} store.put(key, value); store.put(key, null); // now "null" object is just a special value that do not indicate deletion store.get(key); // returns null, but this should be interpreted as "the key is associated with null" {code} Now assuming you have a customized serde that maps "not null" object to "null" byte arrays, in this case the "not-null" object is really interpreted as a dummy value that the above still holds {code} store.put(key, value); store.put(key, MY_DUMMY); // serialized into "null" byte arrays store.get(key); // returns MY_DUMMY as "null" byte arrays is deserialized symmetrically {code} So I think if we want to allow the above customized interpretation then we should not implement {{delete()}} as {{put(key, null)}} since "null" objects may not indicate deletions; if we want to be more restrict then we should emphasize that in the javadoc above that "@param value The value, it can be null which indicates deletion of the key". WDYT? > KeyValueIterator returns null values > ------------------------------------ > > Key: KAFKA-4750 > URL: https://issues.apache.org/jira/browse/KAFKA-4750 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.1.1, 0.11.0.0, 0.10.2.1 > Reporter: Michal Borowiecki > Assignee: Evgeny Veretennikov > Labels: newbie > Attachments: DeleteTest.java > > > The API for ReadOnlyKeyValueStore.range method promises the returned iterator > will not return null values. However, after upgrading from 0.10.0.0 to > 0.10.1.1 we found null values are returned causing NPEs on our side. > I found this happens after removing entries from the store and I found > resemblance to SAMZA-94 defect. The problem seems to be as it was there, when > deleting entries and having a serializer that does not return null when null > is passed in, the state store doesn't actually delete that key/value pair but > the iterator will return null value for that key. > When I modified our serilizer to return null when null is passed in, the > problem went away. However, I believe this should be fixed in kafka streams, > perhaps with a similar approach as SAMZA-94. -- This message was sent by Atlassian JIRA (v6.4.14#64029)