Hi John,

Thanks for taking. look at the KIP!

The point about stream time not advancing in case of infrequent updates is
an interesting one. I can imagine if the upstream producer to a Kafka
Streams application is a Source Connector which isn't sending records
frequently(due to the nature of the data ingestion for example), then the
downstream stream processing can land into the issues you described above.

Which also brings me to the second point you made about how this would be
used by downstream consumers. IIUC, you are referring to the consumers of
the newly added topic i.e the heartbeat topic. In my mind, the heartbeat
topic is an internal topic (similar to offsets/config/status topic in
connect), the main purpose of which is to trick the framework to produce
records to the offsets topic and advance the offsets. Since every connector
could have a different definition of offsets(LSN, BinLogID etc for
example), that logic to determine what the heartbeat records should be
would have to reside in the actual connector.

Now that I think of it, it could very well be consumed by downstream
consumers/ Streams or Flink Applications and be further used for some
decision making. A very crude example could be let's say if the heartbeat
records sent to the new heartbeat topic include timestamps, then the
downstream streams application can use that timestamp to close any time
windows. Having said that, it still appears to me that it's outside the
scope of the Connect framework and is something which is difficult to
generalise because of the variety of Sources and the definitions of offsets.

But, I would still be more than happy to add this example if you think it
can be useful in getting a better understanding of the idea and also its
utility beyond connect. Please let me know!

Thanks!
Sagar.


On Fri, Mar 24, 2023 at 7:22 PM John Roesler <vvcep...@apache.org> wrote:

> Thanks for the KIP, Sagar!
>
> At first glance, this seems like a very useful feature.
>
> A common pain point in Streams is when upstream producers don't send
> regular updates and stream time cannot advance. This causes
> stream-time-driven operations to appear to hang, like time windows not
> closing, suppressions not firing, etc.
>
> From your KIP, I have a good idea of how the feature would be integrated
> into connect, and it sounds good to me. I don't quite see how downstream
> clients, such as a downstream Streams or Flink application, or users of the
> Consumer would make use of this feature. Could you add some examples of
> that nature?
>
> Thank you,
> -John
>
> On Fri, Mar 24, 2023, at 05:23, Sagar wrote:
> > Hi All,
> >
> > Bumping the thread again.
> >
> > Sagar.
> >
> >
> > On Fri, Mar 10, 2023 at 4:42 PM Sagar <sagarmeansoc...@gmail.com> wrote:
> >
> >> Hi All,
> >>
> >> Bumping this discussion thread again.
> >>
> >> Thanks!
> >> Sagar.
> >>
> >> On Thu, Mar 2, 2023 at 3:44 PM Sagar <sagarmeansoc...@gmail.com> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I wanted to create a discussion thread for KIP-910:
> >>>
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-910%3A+Update+Source+offsets+for+Source+Connectors+without+producing+records
> >>>
> >>> Thanks!
> >>> Sagar.
> >>>
> >>
>

Reply via email to