[jira] [Commented] (KAFKA-7224) KIP-328: Add spill-to-disk for Suppression

Maatari (Jira) Thu, 30 Apr 2020 15:38:13 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097032#comment-17097032
 ]


Maatari commented on KAFKA-7224:
--------------------------------

{quote}*However, if there was a way to enforce a maximum time a records stay in 
the buffer without being emitted,*
{quote}
{quote}Well, the current suppress does this. Or do you refer to wall-clock time?
{quote}
I think a bit of confusion here as well. What i mean is exactly the last point 
i refer to in my last message. So to clarify, based on the last comment of my 
last message, if for you *wall-clock-time* emit strategy means not being event 
driven, as the author suggested, but driven by the wall-clock only, then yes i 
do refer to the wall clock time when i say this. 



{quote}I cannot follow here. If you buffer and suppress updates to the same key 
and emit update in a certain "frequency" there is no difference if you do this 
in-memory of if you spill to disk. The only difference is, how many unique keys 
the suppress buffer can handle: for in-memory the number of unique keys is 
smaller as all the data must fit into main-memory, while RocksDB would allow to 
process more unique keys. But the number of unique keys is independent to the 
number of intermediate result (that you need to count _per key_ as updates to 
two different keys would never suppress each other).


{quote}
You are spot on my point when you mention that rocksDB would allow to process 
(suppress) more unique keys. Beside that obviously, my thinking was the more 
unique keys i can holds, the more suppression i can do without evicting things. 
However i do not understand your last statement. 


{quote}But the number of unique keys is independent to the number of 
intermediate result (that you need to count _per key_ as updates to two 
different keys would never suppress each other).
{quote}
You do not think that the bigger the suppression buffer, wether in memory or on 
disk, the more suppression you can do ?


So far if i understood you well, it sounds like a combination of KIP-328 + 
KIP-242 (wall-clock-time emit strategy) would solve my use case no ? How to get 
there is another question, but at least making sure to go in the right 
direction is important. I like the approche of keeping the semantic of the 
stream, separate from the operational concern 
[https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers/]

> KIP-328: Add spill-to-disk for Suppression
> ------------------------------------------
>
>                 Key: KAFKA-7224
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7224
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Priority: Major
>
> As described in 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables]
> Following on KAFKA-7223, implement the spill-to-disk buffering strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-7224) KIP-328: Add spill-to-disk for Suppression

Reply via email to