[ 
https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063945#comment-16063945
 ] 

Guozhang Wang commented on KAFKA-4750:
--------------------------------------

[~mjsax][~evis] [~mihbor] Thanks for your comments. I would like to think a bit 
more on the general resolution for this case though before reviewing [~evis]'s 
patch:

1. In Kafka messages, "null" byte arrays indicate tombstones, note that this 
means that if user's serde decide to serialize any objects into null for a log 
compacted topic (e.g. a changelog topic of a state store), it meant to delete 
the record from the store.

2. In Kafka Streams state stores, we did NOT enforcing if "null" indicates 
deletion from the javadoc:

{code}
    /**
     * Update the value associated with this key
     *
     * @param key The key to associate the value to
     * @param value The value, it can be null.
     * @throws NullPointerException If null is used for key.
     */
    void put(K key, V value);
{code}

However our implementation did treat value-typed "null" (note it is not "null" 
byte arrays as in serialized messages) as deletions, since we implement 
{{delete(key)}} as {{put(key, null)}}. As Evgeny / Michal mentioned, it is 
intuitive if our {{put}} semantics aligned with Java's map operations:

{code}
...  // store initialized as empty

store.get(key); // returns null

store.put(key, value);
store.delete(key);
store.get(key);  // returns null

store.put(key, value);
store.put(key, null);  // we can interpret it as "associate the key with null" 
or simply delete this key
store.get(key);  // returns null, though generally speaking it could indicate 
either the key is associated with value or the key does not exist
{code}

Now assuming you have a customized serde that maps "null" object to "not-null" 
byte arrays, in this case the above would still hold:

{code}
store.put(key, value);
store.put(key, null);  // now "null" object is just a special value that do not 
indicate deletion
store.get(key);  // returns null, but this should be interpreted as "the key is 
associated with null"
{code}

Now assuming you have a customized serde that maps "not null" object to "null" 
byte arrays, in this case the "not-null" object is really interpreted as a 
dummy value that the above still holds

{code}
store.put(key, value);
store.put(key, MY_DUMMY);  // serialized into "null" byte arrays
store.get(key);  // returns MY_DUMMY as "null" byte arrays is deserialized 
symmetrically
{code}

So I think if we want to allow the above customized interpretation then we 
should not implement {{delete()}} as {{put(key, null)}} since "null" objects 
may not indicate deletions; if we want to be more restrict then we should 
emphasize that in the javadoc above that "@param value The value, it can be 
null which indicates deletion of the key".

WDYT?

> KeyValueIterator returns null values
> ------------------------------------
>
>                 Key: KAFKA-4750
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4750
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.1.1, 0.11.0.0, 0.10.2.1
>            Reporter: Michal Borowiecki
>            Assignee: Evgeny Veretennikov
>              Labels: newbie
>         Attachments: DeleteTest.java
>
>
> The API for ReadOnlyKeyValueStore.range method promises the returned iterator 
> will not return null values. However, after upgrading from 0.10.0.0 to 
> 0.10.1.1 we found null values are returned causing NPEs on our side.
> I found this happens after removing entries from the store and I found 
> resemblance to SAMZA-94 defect. The problem seems to be as it was there, when 
> deleting entries and having a serializer that does not return null when null 
> is passed in, the state store doesn't actually delete that key/value pair but 
> the iterator will return null value for that key.
> When I modified our serilizer to return null when null is passed in, the 
> problem went away. However, I believe this should be fixed in kafka streams, 
> perhaps with a similar approach as SAMZA-94.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to