It's not so easy to tell when we really need to buffer records until we 
actually get some records. This is a consequence (maybe a downside) of my 
choice to use `TimeDefinition` to use the window-end time as "now" and the 
grace period as the `suppressDuration`. Because of this, within the buffering 
context, even with a `suppressDuration` of 0, we might still need to buffer, as 
the effective timestamp is in the future.

Thinking through this, we could try instead using the window start as "now" and 
using the window size + grace period as the suppress duration, but offhand it 
seems this wouldn't work too well with SessionWindows (or other variable-sized 
windows).

So instead what I chose to do is just do a lightweight check when I need the 
buffer and initialize it if it hasn't already been. I could even move the `if 
buffer == null` to right here, and jit branch prediction would ensure this lazy 
check is almost zero after buffer gets initialized.

Some alternatives:
1. discard the optimization and just always initialize it, in case I need it.
2. junk the (maybe unnecessarily) flexible `TimeDefinition` function and 
instead just use a "time strategy" enum that tells the processor whether it 
should use record time or window-end time:
In the former case, if the duration is zero, we know we'll never need a buffer. 
If it's > zero, we'll probably need one.
In the latter case, we'll probably need a buffer, regardless of the suppression 
duration.

WDYT?

[ Full content available at: https://github.com/apache/kafka/pull/5693 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to