[ https://issues.apache.org/jira/browse/KAFKA-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468302#comment-15468302 ]
Guozhang Wang commented on KAFKA-3779: -------------------------------------- Are all KTable changelog stream contains deduped data even after KAFKA-3776? Generally speaking, the KTable changelog will only be deduped if it has a corresponding state store for upon creation. Currently we have the following scenarios for creating a KTable. 1. {{builder.table()}} to read from a source topic. 2. aggregation operators that generate a windowed / non-windowed KTable. 3. KTable's non-stateful operators such as {{filter}} that generates a new KTable. 4. KTable-KTable join that generate a new KTable. Today 1) and 2) above have a state store for the generated KTable, and hence it is dedupped; for 3) as long as the original KTable is deduped it will be deduped as well; for 4) the resulted KTable is not backed by a state store since it may not be deduped. Hence the new function {{KTable.getStoreName()}} may still return a null value; in this case does it still make sense to add the cache for its {{KTable.to()}} function? > Add the LRU cache for KTable.to() operator > ------------------------------------------ > > Key: KAFKA-3779 > URL: https://issues.apache.org/jira/browse/KAFKA-3779 > Project: Kafka > Issue Type: Sub-task > Components: streams > Affects Versions: 0.10.1.0 > Reporter: Eno Thereska > Fix For: 0.10.1.0 > > > The KTable.to operator currently does not use a cache. We can add a cache to > this operator to deduplicate and reduce data traffic as well. This is to be > done after KAFKA-3777. -- This message was sent by Atlassian JIRA (v6.3.4#6332)