[ 
https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096863#comment-17096863
 ] 

Maatari edited comment on KAFKA-7224 at 4/30/20, 6:44 PM:
----------------------------------------------------------

What i call intermediate result, is in the following context. Let say you have 
the following topology 
{code:java}
ktable0.join(ktable1.groupby.reduce){code}
Where the reduce just act as the collectList in KSQL. This is a use case we 
have. There is a repartition topic at the groupby, and therefore you would 
emit, multiple time the same records, while the list collected with the reduce 
will keep increasing, possibly until the entire repartition topic is consume. 
This next generate, multiple results for join as well, as the same key on the 
right of the join will come multiple time. So you end up having systematic ever 
growing version of records. That is what i call intermediate result. This is a 
way to build views on normalize data, that build entity with reference to all 
its outgoing links. We use to do that in our databases, but it was not scaling. 


was (Author: maatdeamon):
What i call intermediate result, is in the following context. Let say you have 
the following topology 
{code:java}
ktable0.join(ktable1.groupby.reduce){code}
Where the reduce just act as the collectList in KSQL. This is a use case we 
have. There is a repartition topic at the groupby, and therefore you would 
emit, multiple time the same records, while the list collected with the reduce 
will keep increasing, until the entire topic is consume. This next generate, 
multiple results for join as well, as the same key on the right of the join 
will come multiple time. So you end up having systematic every growing version 
of records. That is what i call intermediate result. This is a way to build 
views on normalize data, that build entity with reference to all its outgoing 
links. We use to do that in our databases, but it was not scaling. 

> KIP-328: Add spill-to-disk for Suppression
> ------------------------------------------
>
>                 Key: KAFKA-7224
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7224
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Priority: Major
>
> As described in 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables]
> Following on KAFKA-7223, implement the spill-to-disk buffering strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to