[ https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096976#comment-17096976 ]
Matthias J. Sax commented on KAFKA-7224: ---------------------------------------- It seem we use the term "intermediate" result in the same way. However, note, that for a "KTable-KTable" join there is no "final" result: the result is by definition an infinite changelog streams: for each update the input tables, and new result update record is produced. Hence, the only thing you can do it, to say: don't give me every update, but (for the same key) only a subset of updates. {quote}cause if i want to suppress all the intermediary result let say at the end of the topology above {quote} What do you mean by "at the end of the topology"? There is nothing like this. Note that the input is not a "finite table" but the "infinite table changelog stream". {quote}given the frequency with which the database is updated, i can find myself with records, stuck in the supression buffer. Indeed it is stream time {quote} That is by design. Because the input may contain out-of-order data, time cannot easily be advanced if the input stream "stalls". Otherwise, the whole operation becomes non-deterministic (what might be ok for your use case though). This would require some wall-clock time emit strategy though (as you mentioned already, ie, KP-424). {quote}However, if there was a way to enforce a maximum time a records stay in the buffer without being emitted, {quote} Well, the current suppress does this. Or do you refer to wall-clock time? {quote}and if that buffer was rocksDB, then i think i could massively mitigate those intermediary result, and produce despite the frequency of the db i am ready the data from. {quote} I cannot follow here. If you buffer and suppress updates to the same key and emit update in a certain "frequency" there is no difference if you do this in-memory of if you spill to disk. The only difference is, how many unique keys the suppress buffer can handle: for in-memory the number of unique keys is smaller as all the data must fit into main-memory, while RocksDB would allow to process more unique keys. But the number of unique keys is independent to the number of intermediate result (that you need to count _per key_ as updates to two different keys would never suppress each other). > KIP-328: Add spill-to-disk for Suppression > ------------------------------------------ > > Key: KAFKA-7224 > URL: https://issues.apache.org/jira/browse/KAFKA-7224 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: John Roesler > Priority: Major > > As described in > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables] > Following on KAFKA-7223, implement the spill-to-disk buffering strategy. -- This message was sent by Atlassian Jira (v8.3.4#803005)