This looks like a big step in the right direction IMO. So am I correct in 
assuming this idle period would only come into play after startup when waiting 
for initial records to be fetched? In other words, once we have seen records 
from all topics and have established the stream time processing will not go 
idle again right?

I still feel that timestamp semantics are not wanted in all cases. Consider a 
simple stream-table join to augment incoming events where the table data has 
been updated recently (and hence has a later timestamp than some incoming 
events). Currently this will not be joined at all (assuming older table records 
have been compacted) until the timestamps on events start passing the table 
updates. For use cases like this I'd like to be able to say always prefer 
processing the table backing topic if it has data available, regardless of 
timestamp.

On Fri, 2018-08-03 at 14:00 -0700, Guozhang Wang wrote:

Hello all,


I would like to kick off a discussion on the following KIP, to allow users

control when a task can be processed based on its buffered records, and how

the stream time of a task be advanced.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization


This is related to one of the root causes of out-of-ordering data in Kafka

Streams. Any thoughts / comments on this topic is more than welcomed.



Thanks,

-- Guozhang


________________________________

This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Reply via email to