[
https://issues.apache.org/jira/browse/HBASE-25881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rahul Kumar reassigned HBASE-25881:
-----------------------------------
Assignee: Rahul Kumar
> Create a chore to update age related metrics.
> ---------------------------------------------
>
> Key: HBASE-25881
> URL: https://issues.apache.org/jira/browse/HBASE-25881
> Project: HBase
> Issue Type: Improvement
> Reporter: Rushabh Shah
> Assignee: Rahul Kumar
> Priority: Major
>
> We had a case where logRoller and ReplicationShipper thread were stuck for a
> day since some other thread was holding the lock.
> We were not rolling the wal for 1 day and we were not shipping any edits for
> 1 day.
> Still the oldestWalAge and age of last ship metric were not spiking as they
> should.
> The way we calculate any age related metric is we calculate the diff between
> current time and the time at which any event happens and we add that to
> metrics Framework. We lose the event timestamp at that point.
> If the thread populating the metric is stuck then we will always carry
> forward the same value forever. This will make it look like there is no
> problem in the system. In this case the oldestWalAge metric was stuck at 809
> value and age of last ship metric was 0 the whole time and no PD alert was
> fired.
> From Andrew Purtell,
> We have the Chore/ScheduledChore framework. We could be making more use of
> it. Much of this is legacy, before Chore was formalized as it is today.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)