[ https://issues.apache.org/jira/browse/KAFKA-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290685#comment-17290685 ]
Guozhang Wang commented on KAFKA-10847: --------------------------------------- I agree that we should delete upon emitting expired records' joined results. Currently since we do range-query + deletion per input record, I guess in practice each time we would only expire very few records. If range query + deletion turns out to be an overhead in practice, we can consider 1) do range-query + deletion less frequently so that each time we would get a reasonable number of records to expire, and 2) use range deletion (https://rocksdb.org/blog/2018/11/21/delete-range.html), which would be efficient especially if we have more records to expire in one call. bq. I do a single-lookup in the store to check if the key is there, if not, then it continues; otherwise it calls the put(key, null) to delete it. Just a syntax sugar, you can just call `putIfAbsent(key, null)` instead. > Avoid spurious left/outer join results in stream-stream join > ------------------------------------------------------------- > > Key: KAFKA-10847 > URL: https://issues.apache.org/jira/browse/KAFKA-10847 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Assignee: Sergio Peña > Priority: Major > > KafkaStreams follows an eager execution model, ie, it never buffers input > records but processes them right away. For left/outer stream-stream join, > this implies that left/outer join result might be emitted before the window > end (or window close) time is reached. Thus, a record what will be an > inner-join result, might produce a eager (and spurious) left/outer join > result. > We should change the implementation of the join, to not emit eager left/outer > join result, but instead delay the emission of such result after the window > grace period passed. -- This message was sent by Atlassian Jira (v8.3.4#803005)