This was the punctuation concern @guozhangwang brought up. I haven't optimized this yet because I wanted to discuss the available options first.
I'm thinking: 1. store the min timestamp in the buffer to make this function cheap when there's nothing to do 2. schedule just one punctuator for all the buffers. This would require more coordination in the topology builder, and I'm not sure if it would actually yield any benefit. Is iterating over buffers any better than iterating over an equal number of punctuators? 3. schedule the punctuator less frequently. This would improve performance for high-frequency topics, but not for medium to low frequency topics. On the downside, it would sacrifice resolution and make the tests a little tricky to reason about. 3a. we could probably make a reasonable approximation of the appropriate resolution based on the suppression time limit, like `min( max(1, suppressDuration / 10), 30 seconds)`, or even tie it to the commit interval. 3b. to mitigate the testing problem, we could add a private mechanism to directly set the resolution. (not sure this is needed; would like to see how awkward it is in practice once we decide on some optimizations) [ Full content available at: https://github.com/apache/kafka/pull/5693 ] This message was relayed via gitbox.apache.org for [email protected]
