[jira] [Commented] (SAMZA-503) Lag gauge very slow to update for slow jobs

Chris Riccomini (JIRA) Mon, 26 Jan 2015 17:57:12 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292837#comment-14292837
 ]


Chris Riccomini commented on SAMZA-503:
---------------------------------------

bq. 1) and 2) are easy to implement and almost there. While 3) I think needs a 
little work.

I agree with you. (1) and (2) had been what I'd been thinking we'd get if we 
added IncomingMessageEnvelope.getMaxOffset(). The OffsetManager could expose 
checkpointed offset, max offset, and max offset - checkpointed offset. The 
BrokerProxy could expose current offset, max offset, and messages behind high 
watermark.

For (3), I'm reluctant to add any new APIs directly to SystemConsumer. If we 
keep existing APIs, then using the IncomingMessageEnvelope.getMaxOffset() API 
seems to be equivalent to deferring to the underlying consumer, since the 
SystemConsumer implementation must convert their offsets to a long, or put in a 
junk value if it can't.

Another way to think about this would be in terms of timestamp, not messages. 
"10 seconds behind", rather than "10 messages" behind. Again, though, if we did 
this, I think most likely, we'd add an IncomingMessageEnvelope.getTimestamp 
field, and do the calculation exactly as we're discussing for getMaxOffset().

[~closeuris], regarding your patch, looks good. Can you add a test to verify 
that refresh works as expected?

> Lag gauge very slow to update for slow jobs
> -------------------------------------------
>
>                 Key: SAMZA-503
>                 URL: https://issues.apache.org/jira/browse/SAMZA-503
>             Project: Samza
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.8.0
>         Environment: Mac OS X, Oracle Java 7, ProcessJobFactory
>            Reporter: Roger Hoover
>            Assignee: Yan Fang
>             Fix For: 0.9.0
>
>         Attachments: SAMZA-503.1.patch, SAMZA-503.patch
>
>
> For slow jobs, the 
> KafkaSystemConsumerMetrics.%s-%s-messages-behind-high-watermark) gauge does 
> not get updated very often.
> To reproduce:
> * Create a job that processes one message and sleeps for 5 seconds
> * Create it's input topic but do not populate it yet
> * Start the job
> * Load 1000s of messages to it's input topic.  You can keep adding messages 
> with a "watch -n 1 <kafka console producer command>"
> What happens:
> * Run jconsole to view the JMX metrics
> * The %s-%s-messages-behind-high-watermark gauge will stay at 0 for a LONG 
> time (~10 minutes?) before finally updating.
> What should happen:
> * The gauge should get updated at a reasonable interval (a least every few 
> seconds)
> I think what's happening is that the BrokerProxy only updates the high 
> watermark when a consumer is ready for more messages.  When the job is so 
> slow, this rarely happens to the metric doesn't get updated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-503) Lag gauge very slow to update for slow jobs

Reply via email to