[
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lars Hofhansl resolved HBASE-11143.
-----------------------------------
Resolution: Fixed
Hadoop Flags: Reviewed
Committed to 0.94, 0.98, and trunk. Thanks J-D, Andy, and Stack.
> Improve replication metrics
> ---------------------------
>
> Key: HBASE-11143
> URL: https://issues.apache.org/jira/browse/HBASE-11143
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Fix For: 0.99.0, 0.94.20, 0.98.3
>
> Attachments: 11143-0.94-v2.txt, 11143-0.94-v3.txt, 11143-0.94.txt,
> 11143-trunk.txt
>
>
> We are trying to report on replication lag and find that there is no good
> single metric to do that.
> ageOfLastShippedOp is close, but unfortunately it is increased even when
> there is nothing to ship on a particular RegionServer.
> I would like discuss a few options here:
> Add a new metric: replicationQueueTime (or something) with the above meaning.
> I.e. if we have something to ship we set the age of that last shipped edit,
> if we fail we increment that last time (just like we do now). But if there is
> nothing to replicate we set it to current time (and hence that metric is
> reported to close to 0).
> Alternatively we could change the meaning of ageOfLastShippedOp to mean to do
> that. That might lead to surprises, but the current behavior is clearly weird
> when there is nothing to replicate.
> Comments? [~jdcryans], [~stack].
> If approach sounds good, I'll make a patch for all branches.
> Edit: Also adds a new shippedKBs metric to track the amount of data that is
> shipped via replication.
--
This message was sent by Atlassian JIRA
(v6.2#6252)