[ https://issues.apache.org/jira/browse/SAMZA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281030#comment-14281030 ]
Chris Riccomini commented on SAMZA-503: --------------------------------------- bq. Why does the SimpleConsumer have a separate consumerID ? (It does have a clientID of its own - which is a string). Looks like the older version did not have an 'int consumerID' param. Pretty weird No kidding. This is bizarre. [~jjkoshy]/[~guozhang], any idea while SimpleConsumer.earliestOrLatestOffset takes an int as the client id? bq. Why is this the case ? Don't offsets imply some kind of ordering in some sort of collection of objects ? What is the side effect of imposing ordering on the offsets ? (Also which input system stream does not hold this currently ? ) I think we can define them however we want. If we have a stronger definition of offsets (e.g. they're ordered, and longs, not strings), then Samza has much better control of things. The trade-off is that we could be excluding other systems that might have un-sorted GUIDs, or byte arrays as their offsets. I hadn't had any real world example of this, but [~martinkl] mentioned recently that PostgresQL's changelog replication mechanism uses unordered GUIDs as its offsets. > Lag gauge very slow to update for slow jobs > ------------------------------------------- > > Key: SAMZA-503 > URL: https://issues.apache.org/jira/browse/SAMZA-503 > Project: Samza > Issue Type: Bug > Components: metrics > Affects Versions: 0.8.0 > Environment: Mac OS X, Oracle Java 7, ProcessJobFactory > Reporter: Roger Hoover > Assignee: Yan Fang > Fix For: 0.9.0 > > Attachments: SAMZA-503.patch > > > For slow jobs, the > KafkaSystemConsumerMetrics.%s-%s-messages-behind-high-watermark) gauge does > not get updated very often. > To reproduce: > * Create a job that processes one message and sleeps for 5 seconds > * Create it's input topic but do not populate it yet > * Start the job > * Load 1000s of messages to it's input topic. You can keep adding messages > with a "wait -n 1 <kafka console producer command>" > What happens: > * Run jconsole to view the JMX metrics > * The %s-%s-messages-behind-high-watermark gauge will stay at 0 for a LONG > time (~10 minutes?) before finally updating. > What should happen: > * The gauge should get updated at a reasonable interval (a least every few > seconds) > I think what's happening is that the BrokerProxy only updates the high > watermark when a consumer is ready for more messages. When the job is so > slow, this rarely happens to the metric doesn't get updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)