[ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297570#comment-16297570
 ] 

Guozhang Wang commented on KAFKA-6323:
--------------------------------------

[~frederica] I'd suggest the following:

1) STREAM_TIME punctuation: I agree with [~mjsax] for aligning on scheduled 
timestamp; to be more specific, suppose user provided interval is {{T}}, we 
would first schedule the next timstamp as {{T}} exactly; then at any point 
suppose our next scheduled the next timestamp {{T1}}, and stream time has 
advanced to {{T2}} because of received data where {{T2 >= T1}}, then we just 
punctuate with parameter {{floor(T2, T)}} and schedule the next punctuation at 
{{floor(T2, T) + T}}.

2) WALL_CLOCK_TIME: we do not try to align on interval, i.e. with user provided 
interval {{T}}, next scheduled time is {{now + T}}, and at the time we did the 
check with scheduled timestamp {{T1}}, if the current system time is {{T2 (T2 
>= T1)}} we punctuate at {{T2}} and schedule the next punctuation at timestamp 
{{T2 + T}}. The argument is that with long GC / 
single-record-taking-long-time-to-process / etc scenarios, we can never have a 
precise or predictable punctuation based on system wall-clock time, so instead 
we'd just try to expose the exact current system time when punctuation is 
triggered.

> punctuate with WALL_CLOCK_TIME triggered immediately
> ----------------------------------------------------
>
>                 Key: KAFKA-6323
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6323
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Frederic Arno
>            Assignee: Frederic Arno
>             Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to