[ 
https://issues.apache.org/jira/browse/HBASE-24543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Pal reassigned HBASE-24543:
-----------------------------------

    Assignee: Sandeep Pal

> ScheduledChore logging is too chatty, replace with metrics
> ----------------------------------------------------------
>
>                 Key: HBASE-24543
>                 URL: https://issues.apache.org/jira/browse/HBASE-24543
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics, Operability
>            Reporter: Andrew Kyle Purtell
>            Assignee: Sandeep Pal
>            Priority: Minor
>
> ScheduledChore logs at DEBUG level the execution time of each chore. 
> We used to log an average execution time across all chores every five 
> minutes, which by consensus was judged to not be useful. Derived metrics like 
> averages or histograms should be calculated per chore. So we modified the 
> logging to dump the chore execution time each time it runs, to facilitate 
> such calculations with the log aggregation and searching tool of choice. Per 
> chore execution logging is more useful, in that sense, but may be too chatty. 
> This is not unexpected but let me provide my observations so we can revisit 
> this.
> On the master, for example, this is logged every second:
> {noformat}
> 2020-06-11 16:35:28,263 DEBUG 
> [master/apurtell-ltm:8100.splitLogManager..Chore.1] hbase.ScheduledChore: 
> SplitLogManager Timeout Monitor execution time: 0 ms.
> {noformat}
> Does the value of these lines outweigh the cost of 86,400 log lines per day 
> per master instance? (At least.)
> On the regionserver it is somewhat better, these are logged every 10 seconds:
> {noformat}
> 2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1] 
> hbase.ScheduledChore: CompactionChecker execution time: 0 ms.
> 2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1] 
> hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms.
> {noformat}
> So that will be 17,280 log lines per day per regionserver. (At least.)
> These could be moved to TRACE level, perhaps. 
> I propose we replace this logging with histogram metrics. There should be a 
> separate metric for each distinct chore classname, allocated as needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to