[jira] [Commented] (SAMZA-503) Lag gauge very slow to update for slow jobs

Roger Hoover (JIRA) Thu, 22 Jan 2015 10:07:33 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287873#comment-14287873
 ]


Roger Hoover commented on SAMZA-503:
------------------------------------

bq The thing that bugs me about doing this in the container-layer (as opposed 
to the Kafka layer) are the weird assumptions that we have to make:
The offsets are ordered
The offsets are longs
The subtraction of the logs is semantically meaningful

Just thinking out loud here but I wonder if the container layer could delegate 
the offset calculation to the system implementation?  Since Samza assumes 
strong ordering already, all systems should be able to tell how far behind they 
are in terms of number of messages (a long).  Each system could have a 
different representation for offsets but would implement some interface to 
calculate it's lag:  getNumMessagesLag()

OR...do you think other system implementations could translate their offset 
representations into long?

>From my extremely limited Google search, it looks like Postgresql streaming 
>protocol sends offsets as 64-bit integers.

http://www.postgresql.org/docs/9.3/static/protocol-replication.html
http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-lsn-datatype/
bq In PostgreSQL terminology, an LSN (Log Sequence Number) is a 64-bit integer 
used to determine a position in WAL (Write ahead log), used to preserve data 
integrity. Internally in code, it is managed as XLogRecPtr, a simple 64-bit 
integer.



> Lag gauge very slow to update for slow jobs
> -------------------------------------------
>
>                 Key: SAMZA-503
>                 URL: https://issues.apache.org/jira/browse/SAMZA-503
>             Project: Samza
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.8.0
>         Environment: Mac OS X, Oracle Java 7, ProcessJobFactory
>            Reporter: Roger Hoover
>            Assignee: Yan Fang
>             Fix For: 0.9.0
>
>         Attachments: SAMZA-503.1.patch, SAMZA-503.patch
>
>
> For slow jobs, the 
> KafkaSystemConsumerMetrics.%s-%s-messages-behind-high-watermark) gauge does 
> not get updated very often.
> To reproduce:
> * Create a job that processes one message and sleeps for 5 seconds
> * Create it's input topic but do not populate it yet
> * Start the job
> * Load 1000s of messages to it's input topic.  You can keep adding messages 
> with a "watch -n 1 <kafka console producer command>"
> What happens:
> * Run jconsole to view the JMX metrics
> * The %s-%s-messages-behind-high-watermark gauge will stay at 0 for a LONG 
> time (~10 minutes?) before finally updating.
> What should happen:
> * The gauge should get updated at a reasonable interval (a least every few 
> seconds)
> I think what's happening is that the BrokerProxy only updates the high 
> watermark when a consumer is ready for more messages.  When the job is so 
> slow, this rarely happens to the metric doesn't get updated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-503) Lag gauge very slow to update for slow jobs

Reply via email to