This was the punctuation concern @guozhangwang brought up.
I haven't optimized this yet because I wanted to discuss the available options 
first.

I'm thinking:
1. store the min timestamp in the buffer to make this function cheap when 
there's nothing to do
2. schedule just one punctuator for all the buffers. This would require more 
coordination in the topology builder, and I'm not sure if it would actually 
yield any benefit. Is iterating over buffers any better than iterating over an 
equal number of punctuators?
3. schedule the punctuator less frequently. This would improve performance for 
high-frequency topics, but not for medium to low frequency topics. On the 
downside, it would sacrifice resolution and make the tests a little tricky to 
reason about.
3a. we could probably make a reasonable approximation of the appropriate 
resolution based on the suppression time limit, like `min( max(1, 
suppressDuration / 10), 30 seconds)`, or even tie it to the commit interval.
3b. to mitigate the testing problem, we could add a private mechanism to 
directly set the resolution. (not sure this is needed; would like to see how 
awkward it is in practice once we decide on some optimizations)

[ Full content available at: https://github.com/apache/kafka/pull/5693 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to