[ 
https://issues.apache.org/jira/browse/KAFKA-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468302#comment-15468302
 ] 

Guozhang Wang commented on KAFKA-3779:
--------------------------------------

Are all KTable changelog stream contains deduped data even after KAFKA-3776? 
Generally speaking, the KTable changelog will only be deduped if it has a 
corresponding state store for upon creation. Currently we have the following 
scenarios for creating a KTable.

1. {{builder.table()}} to read from a source topic.
2. aggregation operators that generate a windowed / non-windowed KTable.
3. KTable's non-stateful operators such as {{filter}} that generates a new 
KTable.
4. KTable-KTable join that generate a new KTable.

Today 1) and 2) above have a state store for the generated KTable, and hence it 
is dedupped; for 3) as long as the original KTable is deduped it will be 
deduped as well; for 4) the resulted KTable is not backed by a state store 
since it may not be deduped.

Hence the new function {{KTable.getStoreName()}} may still return a null value; 
in this case does it still make sense to add the cache for its {{KTable.to()}} 
function?

> Add the LRU cache for KTable.to() operator
> ------------------------------------------
>
>                 Key: KAFKA-3779
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3779
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>    Affects Versions: 0.10.1.0
>            Reporter: Eno Thereska
>             Fix For: 0.10.1.0
>
>
> The KTable.to operator currently does not use a cache. We can add a cache to 
> this operator to deduplicate and reduce data traffic as well. This is to be 
> done after KAFKA-3777.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to