[
https://issues.apache.org/jira/browse/HBASE-24543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell updated HBASE-24543:
----------------------------------------
Description:
ScheduledChore logs at DEBUG level the execution time of each chore.
We used to log an average execution time across all chores every five minutes,
which by consensus was judged to not be useful. Derived metrics like averages
or histograms should be calculated per chore. So we modified the logging to
dump the chore execution time each time it runs, to facilitate such
calculations with the log aggregation and searching tool of choice. Per chore
execution logging is more useful, in that sense, but may be too chatty. This is
not unexpected but let me provide my observations so we can revisit this.
On the master, for example, this is logged every second:
{noformat}
2020-06-11 16:35:28,263 DEBUG
[master/apurtell-ltm:8100.splitLogManager..Chore.1] hbase.ScheduledChore:
SplitLogManager Timeout Monitor execution time: 0 ms.
{noformat}
Does the value of these lines outweigh the cost of 86,400 log lines per day per
master instance? (At least.)
On the regionserver it is somewhat better, these are logged every 10 seconds:
{noformat}
2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1]
hbase.ScheduledChore: CompactionChecker execution time: 0 ms.
2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1]
hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms.
{noformat}
So that will be 17,280 log lines per day per regionserver. (At least.)
These could be moved to TRACE level, perhaps.
I propose we replace this logging with histogram metrics. There should be a
separate metric for each distinct chore classname, allocated as needed.
was:
ScheduledChore logs at DEBUG level the execution time of each chore.
We used to log an average execution time across all chores every five minutes,
which by consensus was judged to not be useful. Derived metrics like averages
or histograms should be calculated per chore. So we modified the logging to
dump the chore execution time each time it runs, to facilitate such
calculations with the log aggregation and searching tool of choice. Per chore
execution logging is more useful, in that sense, but may be too chatty. This is
not unexpected but let me provide my observations so we can revisit this.
On the master, for example, this is logged every second:
{noformat}
2020-06-11 16:35:28,263 DEBUG
[master/apurtell-ltm:8100.splitLogManager..Chore.1] hbase.ScheduledChore:
SplitLogManager Timeout Monitor execution time: 0 ms.
{noformat}
Does the value of these lines outweigh the cost of 86,400 log lines per day per
master instance? (At least.)
On the regionserver it is somewhat better, these are logged every 10 seconds:
{noformat}
2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1]
hbase.ScheduledChore: CompactionChecker execution time: 0 ms.
2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1]
hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms.
{noformat}
So that will be 17,280 log lines per day per regionserver. (At least.)
Perhaps these should be moved to TRACE level.
We should definitely replace this logging with histogram metrics. There should
be a separate metric for each distinct chore classname, allocated as needed.
> ScheduledChore logging is too chatty, replace with metrics
> ----------------------------------------------------------
>
> Key: HBASE-24543
> URL: https://issues.apache.org/jira/browse/HBASE-24543
> Project: HBase
> Issue Type: Improvement
> Components: metrics, Operability
> Reporter: Andrew Kyle Purtell
> Priority: Minor
>
> ScheduledChore logs at DEBUG level the execution time of each chore.
> We used to log an average execution time across all chores every five
> minutes, which by consensus was judged to not be useful. Derived metrics like
> averages or histograms should be calculated per chore. So we modified the
> logging to dump the chore execution time each time it runs, to facilitate
> such calculations with the log aggregation and searching tool of choice. Per
> chore execution logging is more useful, in that sense, but may be too chatty.
> This is not unexpected but let me provide my observations so we can revisit
> this.
> On the master, for example, this is logged every second:
> {noformat}
> 2020-06-11 16:35:28,263 DEBUG
> [master/apurtell-ltm:8100.splitLogManager..Chore.1] hbase.ScheduledChore:
> SplitLogManager Timeout Monitor execution time: 0 ms.
> {noformat}
> Does the value of these lines outweigh the cost of 86,400 log lines per day
> per master instance? (At least.)
> On the regionserver it is somewhat better, these are logged every 10 seconds:
> {noformat}
> 2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1]
> hbase.ScheduledChore: CompactionChecker execution time: 0 ms.
> 2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1]
> hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms.
> {noformat}
> So that will be 17,280 log lines per day per regionserver. (At least.)
> These could be moved to TRACE level, perhaps.
> I propose we replace this logging with histogram metrics. There should be a
> separate metric for each distinct chore classname, allocated as needed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)