[ https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097025#comment-17097025 ]
Maatari edited comment on KAFKA-7224 at 5/1/20, 12:34 AM: ---------------------------------------------------------- Thank you so much for your clarification it helps a lot. Will try to clarify some of my confusing statement. {quote}What do you mean by "at the end of the topology"? There is nothing like this. Note that the input is not a "finite table" but the "infinite table changelog stream". {quote} I just meant having something like this {code:java} ktable0.join(ktable1.groupby.reduce).supress(...){code} It is my language here that was misleading. I agree with you, it is not a finite table. What i want is to significantly mitigate the intermediary results. {quote}That is by design. Because the input may contain out-of-order data, time cannot easily be advanced if the input stream "stalls". Otherwise, the whole operation becomes non-deterministic (what might be ok for your use case though). This would require some wall-clock time emit strategy though (as you mentioned already, ie, KP-424). {quote} As you suggest above it is exactly what would put me in the right direction, given my use case. I will specifically adopt your language *wall-clock time emit strategy.* Is that really what was intended in KIP-424. In that page [https://cwiki.apache.org/confluence/display/KAFKA/KIP-424%3A+Allow+suppression+of+intermediate+events+based+on+wall+clock+time] The author specifically says: _"However, checks against the wall clock are event driven: if no events are received in over 5 seconds, no events will be sent downstream"_ Hence, just to clarify, to you mean the same thing when you say wall-clock time emit strategy ? Because if that is the case, the same problem as above will happen, some records can still stay stuck if nothing else comes in. It is important, because I wanted to ask from your point of view if it is even feasible, to have wall-clock time used as i mean. That is, if the time of a key, passed the configure time, even if no new record have been ingested, emit the record anyway. was (Author: maatdeamon): Thank you so much for your clarification it helps a lot. Will try to clarify some of my confusing statement. {quote}What do you mean by "at the end of the topology"? There is nothing like this. Note that the input is not a "finite table" but the "infinite table changelog stream". {quote} I just meant having something like this {code:java} ktable0.join(ktable1.groupby.reduce).supress(...){code} It is my language here that was misleading. I agree with you, it is not a finite table. What i want is to significantly mitigate the intermediary results. {quote}That is by design. Because the input may contain out-of-order data, time cannot easily be advanced if the input stream "stalls". Otherwise, the whole operation becomes non-deterministic (what might be ok for your use case though). This would require some wall-clock time emit strategy though (as you mentioned already, ie, KP-424).{quote} As you suggest above it is exactly what would put me in the right direction, given my use case. I will specifically adopt your language *wall-clock time emit strategy.* Is that really what was intended in KIP-424. In that page [https://cwiki.apache.org/confluence/display/KAFKA/KIP-424%3A+Allow+suppression+of+intermediate+events+based+on+wall+clock+time] The author specifically says: _"However, checks against the wall clock are event driven: if no events are received in over 5 seconds, no events will be sent downstream"_ Hence, just to clarify, to you mean the same thing when you say wall-clock time emit strategy ? Because if that is the case, the same problem as above will, me, some records can still stay stuck if nothing else comes in. It is important, because i was to ask from your point of you if it is even feasible, to have wall-clock time used as i mean. That is, if the time of a key, passed the configure time, even if no new record have been ingested, emit the record anyway. > KIP-328: Add spill-to-disk for Suppression > ------------------------------------------ > > Key: KAFKA-7224 > URL: https://issues.apache.org/jira/browse/KAFKA-7224 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: John Roesler > Priority: Major > > As described in > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables] > Following on KAFKA-7223, implement the spill-to-disk buffering strategy. -- This message was sent by Atlassian Jira (v8.3.4#803005)