[ 
https://issues.apache.org/jira/browse/KAFKA-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-6903:
---------------------------------
    Description: 
Today in KTable's internal implementation, if old values are needed in the down 
stream (e.g. if there is an aggregation down stream so that old values need to 
be re-send to "subtract" its effects in addition to incorporate the effects of 
new values), we will re-compute the old values based on the parent's passed in 
old values. This behavior has two issues:

1) re-computing the values again means more cost: for each updated value, they 
are computed twice, once as the new value and once as the old value. This 
additional cost can ideally be saved.

2) if the computational logic is dependent on some state which could be updated 
over time, then calling the same applied function again may actually result in 
different values, due to the different state's snapshot.

We should consider how to improve this behavior to avoid the above issues. More 
specifically: if the KTable is materialized, we can consider reading the old 
value directly from the materialized store (note that it may not be always the 
best, e.g. doing a simple filter v.s. reading from a persistent store, which 
takes more time)?

  was:
Today in KTable's internal implementation, if old values are needed in the down 
stream (e.g. if there is an aggregation down stream so that old values need to 
be re-send to "subtract" its effects in addition to incorporate the effects of 
new values), we will re-compute the old values based on the parent's passed in 
old values. This behavior has two issues:

1) re-computing the values again means more cost: for each updated value, they 
are computed twice, once as the new value and once as the old value. This 
additional cost can ideally be saved.

2) if the computational logic is dependent on some state which could be updated 
over time, then calling the same applied function again may actually result in 
different values, due to the different state's snapshot.

We should consider how to improve this behavior to avoid the above issues.


> Improve KTable's sending old value behavior
> -------------------------------------------
>
>                 Key: KAFKA-6903
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6903
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Priority: Major
>
> Today in KTable's internal implementation, if old values are needed in the 
> down stream (e.g. if there is an aggregation down stream so that old values 
> need to be re-send to "subtract" its effects in addition to incorporate the 
> effects of new values), we will re-compute the old values based on the 
> parent's passed in old values. This behavior has two issues:
> 1) re-computing the values again means more cost: for each updated value, 
> they are computed twice, once as the new value and once as the old value. 
> This additional cost can ideally be saved.
> 2) if the computational logic is dependent on some state which could be 
> updated over time, then calling the same applied function again may actually 
> result in different values, due to the different state's snapshot.
> We should consider how to improve this behavior to avoid the above issues. 
> More specifically: if the KTable is materialized, we can consider reading the 
> old value directly from the materialized store (note that it may not be 
> always the best, e.g. doing a simple filter v.s. reading from a persistent 
> store, which takes more time)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to