[jira] [Commented] (KAFKA-7224) KIP-328: Add spill-to-disk for Suppression

Matthias J. Sax (Jira) Thu, 30 Apr 2020 14:00:25 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096976#comment-17096976
 ]


Matthias J. Sax commented on KAFKA-7224:
----------------------------------------

It seem we use the term "intermediate" result in the same way. However, note, 
that for a "KTable-KTable"  join there is no "final" result: the result is by 
definition an infinite changelog streams: for each update the input tables, and 
new result update record is produced. Hence, the only thing you can do it, to 
say: don't give me every update, but (for the same key) only a subset of 
updates.
{quote}cause if i want to suppress all the intermediary result let say at the 
end of the topology above
{quote}
What do you mean by "at the end of the topology"? There is nothing like this. 
Note that the input is not a "finite table" but the "infinite table changelog 
stream".
{quote}given the frequency with which the database is updated, i can find 
myself with records, stuck in the supression buffer. Indeed it is stream time
{quote}
That is by design. Because the input may contain out-of-order data, time cannot 
easily be advanced if the input stream "stalls". Otherwise, the whole operation 
becomes non-deterministic (what might be ok for your use case though). This 
would require some wall-clock time emit strategy though (as you mentioned 
already, ie, KP-424).
{quote}However, if there was a way to enforce a maximum time a records stay in 
the buffer without being emitted,
{quote}
Well, the current suppress does this. Or do you refer to wall-clock time?
{quote}and if that buffer was rocksDB, then i think i could massively mitigate 
those intermediary result, and produce despite the frequency of the db i am 
ready the data from.
{quote}
I cannot follow here. If you buffer and suppress updates to the same key and 
emit update in a certain "frequency" there is no difference if you do this 
in-memory of if you spill to disk. The only difference is, how many unique keys 
the suppress buffer can handle: for in-memory the number of unique keys is 
smaller as all the data must fit into main-memory, while RocksDB would allow to 
process more unique keys. But the number of unique keys is independent to the 
number of intermediate result (that you need to count _per key_ as updates to 
two different keys would never suppress each other).

> KIP-328: Add spill-to-disk for Suppression
> ------------------------------------------
>
>                 Key: KAFKA-7224
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7224
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Priority: Major
>
> As described in 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables]
> Following on KAFKA-7223, implement the spill-to-disk buffering strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-7224) KIP-328: Add spill-to-disk for Suppression

Reply via email to