A. Sophie Blee-Goldman created KAFKA-14460:
----------------------------------------------
Summary: In-memory store iterators can return results with null
values
Key: KAFKA-14460
URL: https://issues.apache.org/jira/browse/KAFKA-14460
Project: Kafka
Issue Type: Bug
Components: streams
Reporter: A. Sophie Blee-Goldman
Due to the thread-safety model we adopted in our in-memory stores to avoid
scaling issues, we synchronize all read/write methods and then during range
scans, copy the keyset of all results rather than returning a direct iterator
over the underlying map. When users call #next to read out the iterator
results, we issue a point lookup on the next key and then simply return a new
KeyValue<>(key, get(key))
This lets the range scan return results without blocking access to the store by
other threads and without risk of ConcurrentModification, as a writer can
modify the real store without affecting the keyset copy of the iterator. This
also means that those changes won't be reflected in what the iterator sees or
returns, which in itself is fine as we don't guarantee consistency semantics of
any kind.
However, we _do_ guarantee that range scans "must not return null values" – and
this contract may be violated if the StreamThread deletes a record that the
iterator was going to return.
tl;dr we should check get(key) for null and skip to the next result if
necessary in the in-memory store iterators. See for example
InMemoryKeyValueIterator (note that we'll probably need to buffer one record in
advance before we return true from #hasNext)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)