Hi Jeremiah,

+1 to what Tom said. Samza currently does not rely on Kafka consumer's
checkpointing behavior and
exposes its own notion of a "lag". This is reported as a per-partition
metric under KafkaSystemConsumerMetrics#messagesBehindHighWatermark
<https://github.com/apache/samza/blob/master/samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumerMetrics.scala#L45>
.
My first recommendation would be to setup alerts on this metric and monitor
it.

To use *Burrow* for lag monitoring, you can extend the *KafkaSystemConsumer*
to implement the
CheckpointListener
<https://samza.apache.org/learn/documentation/0.13/api/javadocs/org/apache/samza/checkpoint/CheckpointListener.html>
interface. This interface allows you to intercept Samza's checkpointing
sequence and
plug-in your own logic. An example implementation of the CheckpointListener
could instantiate
a Kafka consumer and periodically commit Samza's checkpointed offsets to
it.

Please let me know if you have any questions.

Best,
Jagadish





On Tue, Nov 27, 2018 at 10:50 AM Jeremiah Adams <jad...@helixeducation.com>
wrote:

>
> I am referring to the "Lag" that can exist when a Consumer Offset is
> significantly less than the Log Size. This difference is Lag and is often
> symptomatic of a problem - processing has stopped or being overwhelmed etc.
>
> Our Legacy Node.js system uses Consumer Groups, (same as say older
> Spark).  To get the Offset, we can use kafka-consumer-groups.sh tool to get
> the offset. For Ops related work we use Kafkamon for these to get a UI up
> for Ops folks.
>
> Our newer stuff uses Samza and I see zero Consumer Groups. Instead I see
> checkpoint topics (example:
> __samza_checkpoint_ver_1_for_generic-delivery_1). I can consume this topic
> and get the current offset by partition, but I don't have the log size, so
> cannot compute the lag. All I can do is see these numbers increment but
> know clue how behind my process is.
>
> I just took Linkedin's Burrow (https://github.com/linkedin/Burrow) for a
> test drive locally, hoping it would solve my problem due to it looking at
> the internal consumers. However, I have the same problem - can't get data
> on a consumer group that doesn't exist.
>
>
>
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter | Facebook | LinkedIn
>
> ________________________________________
> From: Tom Davis <t...@recursivedream.com>
> Sent: Monday, November 26, 2018 6:59 PM
> To: dev@samza.apache.org
> Subject: Re: Alerting and Monitoring Samza Checkpointing?
>
> Have you looked into KafkaSystemConsumerMetrics? Is the meaning of "lag"
> there different from what you mean?
>
>
> Jeremiah Adams <jad...@helixeducation.com> writes:
>
> > We are replacing a node.js app that consumed topics on a Kafka cluster
> with
> > Samza jobs. We use kafka-offsets to trigger alerts based on message lag.
> e.g.,
> > message lag is greater than 10, wake up support persons.
> >
> >
> > Samza doesn't use the same mechanism for offset storage and the tools for
> > examining a topic's checkpoint aren't readily useful for application
> > consumption.
> >
> >
> > Can some of you share your approaches to monitoring and alerting on
> consumer
> > lag?
> >
> >
> > Regards.
> >
> >
> > Jeremiah Adams
> > Software Engineer
> >
> https://url.emailprotection.link/?ahfhEufaAWbezBrUFPG98ZJcterGfIerU3ZwsA3Gv_C0~
> <
> https://url.emailprotection.link/?a49H2rNGIIBtQOw6md8OcHp-qKE3Xn2gNiZ3dlqAeSDA~
> >
> > Blog<
> https://url.emailprotection.link/?a49H2rNGIIBtQOw6md8OcHgFEZu-KYuiu8doY66NWwmmyWxz7kC-27Yfnbdgd2wyh5gjXUa6LMT_NRXsj1g1VVg~~>
> |
> > Twitter<
> https://url.emailprotection.link/?a0Q7ct5_6cOdbJ86kpWB0zx6RbtgugTVC7lU_W7za50jLdZQGpLgVlR1V06zckSaM5oOKb6QBo46Qp9xt0Tt7Aw~~>
> |
> > Facebook<
> https://url.emailprotection.link/?aAmyAO_nS_C1aDgBLeKyGTu0tksTt1_mn2PcS8KJXNJPM04iRHKgX96qGgENV-dMSER5wl8zDVRr3RsS0OmcF9A~~>
> |
> > LinkedIn<
> https://url.emailprotection.link/?aanlcNI-cN74Gdz-TD332xAl6lHu7TRNICWoHUFjYf-KlBjrCGHoYR65b3rl-OyW10nWFv6hwYvUSoVHL4b3vGA~~
> >
>


-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Reply via email to