[ https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097032#comment-17097032 ]
Maatari commented on KAFKA-7224: -------------------------------- {quote}*However, if there was a way to enforce a maximum time a records stay in the buffer without being emitted,* {quote} {quote}Well, the current suppress does this. Or do you refer to wall-clock time? {quote} I think a bit of confusion here as well. What i mean is exactly the last point i refer to in my last message. So to clarify, based on the last comment of my last message, if for you *wall-clock-time* emit strategy means not being event driven, as the author suggested, but driven by the wall-clock only, then yes i do refer to the wall clock time when i say this. {quote}I cannot follow here. If you buffer and suppress updates to the same key and emit update in a certain "frequency" there is no difference if you do this in-memory of if you spill to disk. The only difference is, how many unique keys the suppress buffer can handle: for in-memory the number of unique keys is smaller as all the data must fit into main-memory, while RocksDB would allow to process more unique keys. But the number of unique keys is independent to the number of intermediate result (that you need to count _per key_ as updates to two different keys would never suppress each other). {quote} You are spot on my point when you mention that rocksDB would allow to process (suppress) more unique keys. Beside that obviously, my thinking was the more unique keys i can holds, the more suppression i can do without evicting things. However i do not understand your last statement. {quote}But the number of unique keys is independent to the number of intermediate result (that you need to count _per key_ as updates to two different keys would never suppress each other). {quote} You do not think that the bigger the suppression buffer, wether in memory or on disk, the more suppression you can do ? So far if i understood you well, it sounds like a combination of KIP-328 + KIP-242 (wall-clock-time emit strategy) would solve my use case no ? How to get there is another question, but at least making sure to go in the right direction is important. I like the approche of keeping the semantic of the stream, separate from the operational concern [https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers/] > KIP-328: Add spill-to-disk for Suppression > ------------------------------------------ > > Key: KAFKA-7224 > URL: https://issues.apache.org/jira/browse/KAFKA-7224 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: John Roesler > Priority: Major > > As described in > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables] > Following on KAFKA-7223, implement the spill-to-disk buffering strategy. -- This message was sent by Atlassian Jira (v8.3.4#803005)