This looks like a big step in the right direction IMO. So am I correct in assuming this idle period would only come into play after startup when waiting for initial records to be fetched? In other words, once we have seen records from all topics and have established the stream time processing will not go idle again right?
I still feel that timestamp semantics are not wanted in all cases. Consider a simple stream-table join to augment incoming events where the table data has been updated recently (and hence has a later timestamp than some incoming events). Currently this will not be joined at all (assuming older table records have been compacted) until the timestamps on events start passing the table updates. For use cases like this I'd like to be able to say always prefer processing the table backing topic if it has data available, regardless of timestamp. On Fri, 2018-08-03 at 14:00 -0700, Guozhang Wang wrote: Hello all, I would like to kick off a discussion on the following KIP, to allow users control when a task can be processed based on its buffered records, and how the stream time of a task be advanced. https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization This is related to one of the root causes of out-of-ordering data in Kafka Streams. Any thoughts / comments on this topic is more than welcomed. Thanks, -- Guozhang ________________________________ This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.