[ 
https://issues.apache.org/jira/browse/KAFKA-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163529#comment-17163529
 ] 

Igor Piddubnyi commented on KAFKA-8582:
---------------------------------------

Hi [~mjsax], as discussed in PR please assign the ticket to me.

> Consider adding an ExpiredWindowRecordHandler to Suppress
> ---------------------------------------------------------
>
>                 Key: KAFKA-8582
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8582
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Priority: Major
>
> I got some feedback on Suppress:
> {quote}Specifying how to handle events outside the grace period does seem 
> like a business concern, and simply discarding them thus seems risky (for 
> example imagine any situation where money is involved).
> This sort of situation is addressed by the late-triggering approach 
> associated with watermarks 
> (https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102), given 
> this I wondered if you were considering adding anything similar?{quote}
> It seems like, if a record has arrived past the grace period for its window, 
> then the state of the windowed aggregation would already have been lost, so 
> if we were to compute an aggregation result, it would be incorrect. Plus, 
> since the window is already expired, we can't store the new (incorrect, but 
> more importantly expired) aggregation result either, so any subsequent 
> super-late records would also face the same blank-slate. I think this would 
> wind up looking like this: if you have three timely records for a window, and 
> then three more that arrive after the grace period, and you were doing a 
> count aggregation, you'd see the counts emitted for the window as [1, 2, 3, 
> 1, 1, 1]. I guess we could add a flag to the post-expiration results to 
> indicate that they're broken, but this seems like the wrong approach. The 
> post-expiration aggregation _results_ are meaningless, but I could see 
> wanting to send the past-expiration _input records_ to a dead-letter queue or 
> something instead of dropping them.
> Along this line of thinking, I wonder if we should add an optional 
> past-expiration record handler interface to the suppression operator. Then, 
> you could define your own logic, whether it's a dead-letter queue, sending it 
> to some alerting pipeline, or even just crashing the application before it 
> can do something wrong. This would be a similar pattern to how we allow 
> custom logic to handle deserialization errors by supplying a 
> org.apache.kafka.streams.errors.DeserializationExceptionHandler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to