Hey Chris,

Thanks for jumping in! I have been using the consumer lag as an indicator
for some time, but when measured directly at the consumer, it will not
factor in the time that Connect actually spends transforming and sending
the messages. This is certainly useful for measuring if the connector is
keeping up, but doesn't really tell the story of how much delay Connect
introduces. We also perform a measurement of latency based off of the
commit times, but as you mentioned this is often dominated by the commit
interval. This limits our ability to provide SLOs for sub second latency as
this is used for realtime connecting. Both approaches do allow a
measurement of a connector's ability to keep up with a throughput, but
we've found neither allows us to measure real latency SLOs for Connect as a
realtime service.

- Jordan

On Tue, Sep 7, 2021 at 2:43 PM Chris Egerton <chr...@confluent.io.invalid>
wrote:

> Hi Jordan,
>
> Thanks for the KIP. I'm curious about a possible alternative where the
> consumer lag for the source connector can be monitored instead of the
> newly-proposed metric in the KIP. Although sink tasks can't directly report
> the successful write of a record to the sink system, they are responsible
> for indirectly monitoring and communicating this in the form of the offsets
> returned from the SinkTask::preCommit method. This should mean that, for
> any well-behaved connector that returns accurate offsets from its preCommit
> method (including connectors that perform synchronous writes in
> SinkTask::put, which in most cases will not override the default behavior
> of the preCommit method and will allow the most up-to-date offsets read
> from each topic to be committed to the consumer), the consumer lag for the
> connector should be a decent way to monitor latency. Of course, it'll be at
> the mercy of the commit interval for the connector and whether the
> connector can successfully commit offsets with its consumer, but since that
> often dictates where tasks will resume from if restarted, there's still
> plenty of value in this metric.
>
> Cheers,
>
> Chris
>
> On Thu, Sep 2, 2021 at 7:03 PM Ryanne Dolan <ryannedo...@gmail.com> wrote:
>
> > Thanks Jordan, this is a major blindspot today.
> >
> > Ryanne
> >
> >
> > On Wed, Sep 1, 2021, 6:03 PM Jordan Bull <jordangb...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I would like to start the discussion for KIP-767 involving adding
> latency
> > > metrics to Connect. The KIP can be found at
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-767%3A+Connect+Latency+Metrics
> > >
> > > Thanks,
> > > Jordan
> > >
> >
>

Reply via email to