[ https://issues.apache.org/jira/browse/SAMZA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288506#comment-14288506 ]
Yan Fang commented on SAMZA-503: -------------------------------- It seems that, the problem become that, the subtraction between offsets is not always equal to the number of messages lagged. I think there may not be a one-for-all approach, but we can do a few things: 1) only expose related offsets, such as committed offset, latest offset in incoming stream, latest offset in BrokerProxy, to gauge. Users can do calculation out of Samza when they think the subtraction has the meaning. 2) also expose some subtractions, like what we have now, "-messages-behind-high-watermark". They in most situations have the meaning. But we provide some caveats in docs in case users are using some different systems. 3) like what Roger suggested, transfer the offset into some meaningful representations, such as numberMessagesLag. 1) and 2) are easy to implement and almost there. While 3) I think needs a little work. > Lag gauge very slow to update for slow jobs > ------------------------------------------- > > Key: SAMZA-503 > URL: https://issues.apache.org/jira/browse/SAMZA-503 > Project: Samza > Issue Type: Bug > Components: metrics > Affects Versions: 0.8.0 > Environment: Mac OS X, Oracle Java 7, ProcessJobFactory > Reporter: Roger Hoover > Assignee: Yan Fang > Fix For: 0.9.0 > > Attachments: SAMZA-503.1.patch, SAMZA-503.patch > > > For slow jobs, the > KafkaSystemConsumerMetrics.%s-%s-messages-behind-high-watermark) gauge does > not get updated very often. > To reproduce: > * Create a job that processes one message and sleeps for 5 seconds > * Create it's input topic but do not populate it yet > * Start the job > * Load 1000s of messages to it's input topic. You can keep adding messages > with a "watch -n 1 <kafka console producer command>" > What happens: > * Run jconsole to view the JMX metrics > * The %s-%s-messages-behind-high-watermark gauge will stay at 0 for a LONG > time (~10 minutes?) before finally updating. > What should happen: > * The gauge should get updated at a reasonable interval (a least every few > seconds) > I think what's happening is that the BrokerProxy only updates the high > watermark when a consumer is ready for more messages. When the job is so > slow, this rarely happens to the metric doesn't get updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)